Optimize Your RDS MySQL To GCP CloudSQL Migration

Bhuvanesh
Searce
Published in
8 min readSep 13, 2020

--

Img credit: Oracle

MySQL replication is one of the simplest replication methods that we ever experienced. We have done so many MySQL migrations across multiple Cloud platforms like AWS, GCP, and Azure. In AWS the MySQL migration to RDS is straight forward. We can use our favorite backup tools such as mysqldump, mydumper, Percona Xtrabackup(only for Aurora), and mysqlpump. Or we can use AWS DMS service to migrate the data. In GCP, we have very limited options in terms of using backup tools. We have to use native mysqldump which is always criticized as a Single Thread process and then all the databases should be dumped into a single backup file. Unfortunately, this approach is a more time taken process while dealing with TeraBytes of data.

tl;dr
We wrote this blog for migrating the AWS RDS MySQL to CloudSQL MySQL. CloudSQL is our target. And the steps and workarounds are valid for CloudSQL migrations only. To migrate MySQL on VM, then test thoroughly and do it.

Challenges:

  • Stick with mysqldump tool — CloudSQL will only accept the database dump file that is taken from the native MySQL Dump tool.
    1. The CloudSQL will not support running the Change Master command on a running CloudSQL instance. So we can’t setup the replication manually after spinning up the CloudSQL.
    2. They use GTID based replication. The design of the CloudSQL replication is first restore the dump and run purge GTID from that dump file. Then continue the replication from the information that we provided during the migration setup.
  • Dump file with Definer — To restore an object that contains definer, then the restoration user must be a root user. But in CloudSQL(even on AWS RDS) we’ll not get the actual root user. So while executing create statements with definer will fail.
  • Dump file with Triggers — MySQL will skip the definer on stored procedures. But not on triggers. So while restoring the dump will fail due to permission issue. So we should dump the triggers alone and restore it after prompted the CloudSQL instance as Standalone.
  • User accounts — Generally we’ll not take a dump of the MySQL database(system database). So the users will not be migrated. We can’t even create the users manually after the replication established(we can do it via console, but that will grant the highest privilege).
  • More time — This is the worst thing in CloudSQL migration. While migrating databases with a few terabytes using mysqldump will take many hours and the restoration also will take more time.

After analyzing all of these factors, we came up with some optimization on the CloudSQL replication process while dealing with a huge amount of data.

Our Optimized Approach with MySQLPUMP:

One thing is very clear, we can’t do anything on the Restoration process which is completely taken care of by GCP. So we started researching one other challenge. And finally, we arrived to use one the MySQL’s backup utility called MySQLPUMP a multi-threaded backup tool with more control on what to backup and what objects we can skip.

  • Multi-Thread — Due to its multi-thread process, we can dump multiple tables at the same time. This is a huge time saver for us.
  • GITD — This tool will not add binlog file name and position, but it’ll add the GTID that needs to be purged. CloudSQL will only support GTID based replication, so we need to worry about binlog details.
  • More Objects — As like native mysqldump, this will take the dump of all the database objects like tables, views, functions, stored procedures, triggers and events as well.
  • Users — This is one of the unique features. We can take the dump of all of our MySQL users. We have control over what users can we included or excluded.
  • Object filters — Sometime we may have some backup tables or historical tables, or some unused stored procedures and events. This MySQLPUMP will have options to filter the objects. If we want to skip some backup tables that will never have writes, then we can use exclude-tables flags. This will speed up the whole process.

Some challenges with MySQLPUMP:

Yes, mysqlpump also has some limitations. We have to know them before using it for a production migration.

Not Consistent

Are you kidding me? — [We read your mind 😄]
Of course with AWS RDS, we can’t get a consistent backup with MySQLPUMP. Generally, to make a consistent backup for InnoDB tables, we use --single-transaction flag. But this will execute Flush Tables With READ LOCK on the Database server. In RDS, the master user doesn’t have permission to run this command. This is not documented.

mysql> Flush Tables With READ LOCK;
ERROR 1045 (28000): Access denied for user 'root'@'%' (using password: YES)

Definer

This mysqlpump has a flag called --skip-definer but this will not eliminate the definers on the triggers. So you’ll get the definer syntax in your dump file.

User Accounts

Still, it’ll help you to take the dump of all of your users. But if the host part is mentioned as per the AWS network and you have a different network range on GCP, then these users will not work. If you have % as the host, then its fine.

Workaround:

Perform the dump on the read replica will be the workaround for the consistency issue. Taking a dump on the Master node is not at all recommended. So create a read replica for your MYSQL instance and then stop the replication process. So there will not be any write activity, then we can start taking the dump without the --single-transaction flag.

Unfortunately, we don’t have any MySQL tricks to skip the definer syntax from the dump file. So we decided to use sed command during the dump process and remove the definer.

For user accounts, We can’t do anything. But our suggestion is, once you promoted the instance, do the ALTER USER command and change those host names. Or don’t backup the users. Manually create them after promoting the CloudSQL.

Demo:

Lets jump into a demo and use mysqlpump approach to migrate the database from AWS RDS to CloudSQL MySQL. But make sure you have the following prerequisites.

  1. Enable GTID on RDS(MySQL 5.7.23 and later versions will support this, MySQL 8.0 already enabled with ON_PREMISSIVE mode)
  2. A proxy server. CloudSQL will not talk to AWS RDS endpoints. So create a port forwarding on an EC2 instance or GCP VM. It should in a public subnet. (At the moment, CloudSQL external replication will not support private communication).
  3. Make sure the network connectivity between RDS to Proxy and Proxy to CloudSQL.
  4. A MySQL user with Replication permission.
  5. Make the binlog retention hours to minimum 24hrs(based on your database size feel free to increase it)
  6. Install MySQL Client 5.7+ on the VM where you are going to run the dump command.

Also, we recommend the following this to make the dump process faster.

  1. Increase the Read Replica instance size.
  2. Increase the --max-allowed-packet size, if possible tune some InnoDB IO parameters. (But take your DBA’s suggestion while modifying these variables)
  3. Take the dump from an EC2 instance that is launched on the same VPC where RDS is launched and use the same availability zone.
  4. Attach SSD or PIOPS disk for the dump.

Dump the Database:

Run this following command on the RDS Read Replica.

mysqlpump -h RDS-ENDPOINT \
-u RDS-MASTER-USER \
-p \
--default-parallelism=10 \
--databases db1 db2 db3 \
--events \
--triggers \
--routines \
--users \
--exclude-users=%sys,root,rdsadmin \
--skip-definer \
--default-character-set=utf8mb4 \
--set-gtid-purged=ON \
--max-allowed-packet=700M | sed -e 's/DEFINER=[^ ]*`//'> dump-file.sql.gz
  • --default-parallelism — The number of threads to take the dump.
  • --exclude-user — While creating the CloudSQL it’ll create a default user with the name root and the host as % So we need to exclude this user from the dump and some default system users.
  • --skip-definer — To skip the definer
  • --max-allowed-packet — This is an optional flag.
  • sed — If something is not skipped by the --skip-definer flag, this command will remove them all.

Upload it to GCS:

If you took the dump on EC2 then you have multiple ways to upload it to GCS.

  1. Configure Gcloud CLI on EC2
  2. Upload the dump to S3 and use GCP transfer service to copy it to GCS(recommended)

Setup Replication:

  • Go to the CloudSQL console and use the Migrate Data option.
  • On the Public IP address of the source, use the Public IP of the proxy vm.
  • Fill the credentials.
  • Choose the instance type a bit larger one to speed up the restore process.
  • Use SSD Disk for better IOPS.
  • Give the disk space size equal to RDS, the more the volume size, the more IOPS. Also, enable storage autoscaling to avoid last-minute surprises.
  • Then select your Dump from the GCS.
  • Expand the Advanced Options.
    1. If you have any jump host to access the CloudSQL, you can whitelist it’s IP.(you can do later also)
    2. 📢Click on the Add item button to add the database flags and this is important. Enable log_bin_trust_function_creators and event_scheduler flags.
    3. Tune some InnoDB parameters for a better writing process. (Get your DBA’s help)
  • Click on the create button and wait for a few mins. Then it’ll show you an outgoing IP address. This is IP that communicates with your ProxyVM for the replication. So whitelist this IP to access your proxy VM on the MySQL port.

That's it. Wait for some time, if your database is really huge in then you have to wait for a day or more. Because restoration will restore the tables one by one. Once its done, then you can reset the root user’s password on the CloudSQL console and login with that user to check the replication status.

Replication status
Migrated Users
Triggers are restored

Conclusion:

This is just an alternate approach to migrate the MySQL database to CloudSQL with MySQLPUMP. We don’t how much time it’ll save from the overall migration, but definitely it’ll save your from single thread backup, no need to dump and restore the triggers and users after promoting it as the master. If you already have done a large migration with MySQLPUMP or you are going to migrate after reading this blog, we are happy to hear your story, please comment on your experience.

You may also like,

  1. GTID on AWS RDS
  2. GCP CloudSQL external replication
  3. MYSQLPUMP — A technical overview from MyDBOPS and Percona
  4. MySQL Pump — Documentation

--

--