Migrating Postgres Database From Heroku to GCP

Posted by on April 22, 2022 · 11 mins read

Why Migrate?

Create Database Backup

Create GCP Database Instance

Upload Backup to Cloud Storage

Remove the Extras From the Backup

Connecting to Your New Database

Restoring the Database

Checking Your Work

Conclusion

Why Migrate?

Databases are a fact of life when working in software development. There will always be the need to keep careful track of a variety of information that you will need to have available in many cases for years to come. That can create a unique challenge because technology changes every day. New tools become available, pricing changes for available services, infrastructure needs change, or any number of other things can occur that might require you to move your data from one service to another. Just such a situation occurred for me which required me to migrate a Postgres database from Heroku to GCP.

When I first approached this problem I figured it would be easy. On many occasions I have had to drop and restore databases locally because data became corrupted during development and I needed a clean database to work with. What I discovered was that it isn’t always that easy. I didn’t want to start the final cutover until I was sure that everything would work properly when I completed the process. I also wanted to keep the amount of time that my site was down to a minimum to reduce the impact that it had on my customers. To make sure I could complete the process quickly and efficiently I needed to complete a few dry runs and document the steps so I could repeat the process when I was ready.

Create Database Backup

I already had my Heroku database configured to run automatic database backups every night, so I already had backup copies that I could experiment with before the final cutover date. As part of the process documentation I made sure to make a note that I needed to get a fresh copy of the database before the cutover was completed. With the database backup in hand I logged into the GCP console to get started restoring the database.

Create GCP Database Instance

The first step of the process once I was logged in to GCP was to create my Postgres database instance. I wasn’t running anything particularly configured so I was able to create my database instance with the default configurations. Depending upon your needs you might want to take a look at what Google has to offer for its Cloud SQL Postgres Databases. With the instance running you then need to create a database that you will restore into. This can be accessed using the Databases tab.

Upload Backup to Cloud Storage

The database instance was created and the database was created on the instance, but now I needed to get the backup restored to it. In order to accomplish that you need to have the backup available on GCP. This is done using Cloud Storage . Cloud Storage is just a simple storage tool for keeping files on the cloud. I created a new Cloud Storage instance and was then able to load my backup database copy onto the storage instance. At this point it’s a good idea to make sure that the file name for the backup on the cloud is something reasonable. It will help later when you are typing commands to restore the database if the filename is not something long or challenging to type. Heroku exports their database copies with long names, so I changed my backup to db_backup.dump.

Now all the pieces were in place so I needed to get the restore process started. The first step is to get a cloud terminal running. This can be done from the terminal icon on the top right corner of the screen. Once the terminal is running you can run the command gsutil -u *project_name* cp gs://*bucket_name*/*file_name* ~, making sure to replace *project_name* with the actual name of your project, *bucket_name* with the name of our Cloud Storage instance that you created earlier, and *file_name* with the name of the backup file that you uploaded to Cloud Storage. Under the hood you can think of the cloud terminal that you are running in as its own computer which can interact with your Cloud Storage and Cloud SQL instances. This command will pull the database backup from your Cloud Storage onto the cloud terminal so you can work with it.

Remove the Extras From the Backup

Next we needed to create a new table of contents file for the database backup. It turns out that the database backup that Heroku exports is slightly customized, so you will encounter some issues when you try to restore it if you don’t fix the table of contents. This can be accomplished using the command pg_restore --list *filename_of_dump* | sed -E 's/(.* EXTENSION )/; \1/g' > db_backup.toc. Make sure to update the file name to match the database backup name that you selected earlier. This command will use sed to replace those little extras that Heroku added for you. It then saves the resulting table of contents file to whatever name you selected, in this case db_backup.toc.

Connecting to Your New Database

With the updated table of contents you are ready to connect to your database. The first step of that process requires you to get a cloud proxy tool by running the command wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy followed by chmod +x cloud_sql_proxy. The first command downloads the tool, while the second command makes the newly downloaded binary executable.

Next you will need to know the connection name for your database. To find this you can look on the dashboard, or run the command gcloud sql instances describe *database_instance* --project *project_name*. This will output to the cloud terminal a lot of information about your database, but among the first several lines you will find the connection name for your database.

Once you have this connection information you can create a location on your cloud terminal for your proxy connection to live. Running the command sudo mkdir /cloudsql; sudo chmod 777 /cloudsql followed by ./cloud_sql_proxy -dir=/cloudsql & will create a folder for your connection, make that folder globally accessible on the cloud terminal instance, and store a cloud proxy connection inside the folder.

Restoring the Database

The last step of the process involves the restore itself. This is done with the command pg_restore --no-privileges --no-owner -c -h /cloudsql/*connection_name* -d *database_name* *dump_file* --use-list=*toc_file* -U postgres. This is a long command, so lets take a look at it a little bit at a time.

After the restore call we have the --no-privileges –no-owner flags. These will ensure that any user information that came from the Heroku database carry over. Since this is a new database it stands to reason that you will have a different set of users for this one, so you really don’t need to keep any of the old users or permissions.

The -h flag identifies the host for the database that you are connecting to. This is where you need the connection name for your database that you checked on earlier.

After the connection name you have the -d flag which identifies the specific database on the database that you are restoring to.

After your database backup file there is the --use-list flag which tells the restore tool to use the table of contents file that we fixed up earlier.

The last flag is the -U flag. This identifies the user that will be performing the action. In this example we are using postgres which is the default admin user. Best practice is to create users specific to the operations that they will need to perform so no user of the database (or service that uses the database) has more permissions then they have to have.

When you run the command above it should request a password. This is the password for the postgres user that you input in the above command. If you are using the default users as we did in this example, it would be the password that you created when you created the instance.

Checking Your Work

After that, your database will be restored. I like to verify such things to make sure everything went smoothly. One way to complete this verification is to run the command gcloud sql connect *database_instance* --user=postgres -d *database_name* which will connect to your specified database using the user that you specify, on the Cloud SQL database instance that you specify. Once you’ve done this you will be logged directly into the database so you can run queries to see that the database has been restored.

Conclusion

Databases are a critical part of what we all do every day. There will always be a need for them, and there will always be a need to migrate them from time to time. I hope that this has given you the basic information that you need to get your Heroku database onto GCP. If you find yourself with questions about this process please feel free to reach out, I would love to help!