5 Tricks to Avoid MySQL Replication Delays
Flite started using MySQL replication early in our company history, and it has served us very well. Over time we have evolved from a simple active/passive master-master topology with 2 databases in one data center to a more complex topology with multiple data centers and several different tiers of read-slaves below our master databases.
Replication lag occurs in many MySQL deployments, leading to race conditions and other application problems. This often happens with a high volume of writes to the database, but all it takes is one slow insert/update/delete query to cause the slave to fall behind.
At Flite we’ve generally been able to avoid replication lag. Here are a few of the tricks we use for that purpose:
0) Monitor replication
This is fundamental enough that it gets to be #0. The first thing you need to do is monitor your slaves to see if they are behind the master. If you don’t monitor them, you will never know.
There are several ways to do this.
One way is to run
SHOW SLAVE STATUS\G periodically and check the value of
In the past this method has proven unreliable, so we use a heartbeat that updates every second to monitor our replication lag. Percona toolkit has a handy tool called pt-heartbeat to set this up.
Now that I’ve reminded you about monitoring, here are 5 tips Flite uses to avoid replication delays:
1) Use “dump-and-load” instead of “INSERT…SELECT”
INSERT...SELECT is a very tempting piece of SQL syntax. It allows you to copy data from one or more tables into a separate table using a single SQL statement. However, this syntax can cause problems – whether or not you are using replication. For example imagine you want to populate a summary table to speed up some common queries. Here’s an example using the sakila database to populate a summary table of actors with the number of films they have appeared in. You might start with a query like this:
1 2 3 4 5 6 7
That query is fast since sakila is a small database, but what if you’re running a more complex select that takes 10 seconds to execute on the master? If you are using statement-level replication then the query will probably also take about 10 seconds to execute on the slave database, meaning that replication will be 10 seconds behind when that statement finishes.
What about row-level replication? In that case the update on the slave will be much faster (yay!), but we still have a problem: The
INSERT...SELECT implicitly locks the actor table, so any other thread that tries to update the actor table has to wait for that 10 second select to complete.
I use a
dump-and-load approach to address both problems:
1 2 3 4 5 6 7 8 9 10
select query still takes 10 seconds to execute, but it no longer locks the actor table, so updates to that table are not blocked.
Assuming the number of summary rows is reasonable, the
load data infile should be fast in either statement-based or row-based replication.
But what if the number of summary rows is large? What if we’re updating hundreds of thousands of rows, and the
load data infile statement takes 15 seconds to execute? In that case, we would still end up with a replication delay.
In this situation I often execute the statement separately on the master and the slaves:
2) When it’s safe to do so, execute updates outside of the replication stream
Warning: this is not for the faint of heart, but assuming you know what you are doing this can be safe if you can satisfy a few basic pre-requisites:
The update does not generate any auto-increment values
If you bulk-load data into the same table on two different databases and that table has an auto-increment column, then those auto-increment values will be different. If you’re using checksums to verify consistency between master and slave databases, retrieving values by primary key, etc then that will be a problem.
The user executing the queries has SUPER privilege
You need to run
set sql_bin_log = 0; to prevent the statement from being written to the binary log. That requires the SUPER privilege.
Here’s an example of the type of update that could be safe to do outside of the replication stream.
Say we want to update a summary table of rental activity per day. We only want to include rows created through yesterday, so if this runs at slightly different times of day on the master than the slaves it won’t change the output:
1 2 3 4 5 6 7 8 9 10 11 12
So far so good. Now, what if you want to update a bunch of rows in your database, and the update statement will take several seconds to run? In that case try this approach:
3) Break big updates into smaller chunks
Here’s another example in sakila. This time let’s deactivate the accounts of users who have not rented anything since August 23, 2005.
1 2 3 4 5 6 7 8
Again, this query is fast because the database is so small, but you can easily imagine running a similar statement on your database that would take several seconds (or more!) to execute.
dump-and-load technique to break that update into smaller chunks:
1 2 3 4 5 6 7 8 9 10 11 12 13
Here I’ve separated the reads from the writes again, so the select does not lock the table. Furthermore, I’m only updating one row at a time by primary key, so each update will run very quickly. Another option is to do something more sophisticated like write your own python script to do the updates in small chunks (say 1000 rows per update), and that approach is certainly useful, but this dump-and-load approach is simple and easy to implement for lots of situations. I thought I remembered a tool in the ever-useful Percona Toolkit that automatically chunks a big update, but I can’t remember the name of the tool and I don’t see it on the documentation page.
But what if there are so many updates that we flood the binary log with all of these 1-row updates, overwhelm our single-threaded replication process, and get replication lag? In that case, try this:
4) Rest between updates
This is a fairly crude approach, but it works for us. If I’m doing a lot of inserts, updates, or deletes via
dump-and-load like the customer update above, I almost always add a sleep statement after each insert/update/delete to prevent overwhelming the binary log. The proper amount of time to sleep will depend on your particular environment. The MySQL
sleep() function supports micro-seconds (and milliseconds too I think), so let’s use 0.02 seconds. Rather than using
select sleep(0.02); I prefer to use the handy – but under-utilized – “do” syntax available in MySQL, like this:
do sleep(0.02); to avoid all of the useless output of the sleep statements.
Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13
5) Run DDL on passive DBs
The 4 tricks mentioned above are all for DML. When I need to run DDL I just do it on the passive DBs. This is one of the great benefits of the active/passive topology. When you need to run DDL (or major DML) you can do it on the passive database, flip the roles, and then run it on the other DB in the pair. Again, you need to run
set sql_log_bin = 0; to prevent the statements from being written to the binary log, and you are responsible for making sure you run it on ALL of your databases since replication won’t do it for you.
MySQL does not support online schema change – with the exception of a few small cases like changing a column default value – but there are some tools you can consider. For example, Percona Toolkit has pt-online-schema-change, which implements online schema change by creating a copy of your table, copying all of the existing data over in chunks, and using triggers to copy over and changes that occur after you start running the schema change. When properly configured that tool will not cause replication lag, but it will not work for everyone because it does not support tables with existing triggers.