Flite Culture

Fast-forwarding a Delayed MySQL Replica

Last week I needed to fast-forward a delayed MySQL replica in order to recover some data. I use pt-slave-delay to keep this replica X days behind the master, and I needed to fast-forward it to be about X-2 days behind the master instead.

I wanted to catch the replica up to a precise moment in time, and it was important that it not go past that point until I was done recovering the data I needed.

I knew there was a command to do this, but I couldn’t remember the syntax. Since I figured pt-slave-delay was implemented using this command, I just opened the perl script and searched around until I found the syntax I was looking for. Here it is:

    MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos

Once I got the right syntax I needed to know which log_name and log_pos values to use.

I got the log_name by SSHing to the master DB, listing the binary log directory, and finding the oldest log with an mtime greated than the time I was looking for (If there is a way to get a listing of binary logs based on time range in the MySQL CLI, please let me know in the comments!).

Once I had log_name, I used the mysqlbinlog tool to find the log_pos I was looking for. In my experience start-datetime is inclusive, and stop-datetime is exclusive, so to get all of the binary logging from 2014-02-04 16:18:30 I use that as start-datetime and add 1 second to stop-datetime. For example:

sudo mysqlbinlog bin-log.000386 \
  --start-datetime="2014-02-04 16:18:30" \
  --stop-datetime="2014-02-04 16:18:31" 

Then I sorted through the output of that until I found the most recent SQL statement I wanted to execute on my delayed replica. One of the comments preceding that statement has an end_log_pos value, which I used as my log_pos.

In order to fast-forward the replica I needed to stop running pt-slave-delay. Otherwise it would not let replication proceed past X days behind the master.

Once I had those values, here’s the process I followed to fast-forward the replica:

  • Disable monit for pt-slave-delay so it wouldn’t restart automatically
  • Stop pt-slave-delay
  • Run STOP SLAVE on the delayed replica
  • Run START SLAVE UNTIL MASTER_LOG_FILE = 'bin-log.000386', MASTER_LOG_POS = 950592556 on the delayed replica
  • Wait for replication to catch up to that point
  • Recover the data I need
  • Run STOP SLAVE on the delayed replica (just to be safe)
  • Enable monit for pt-slave-delay
  • Start pt-slave-delay (or just let monit start it)

At that point the replica will stay in that state for about 2 days, at which point pt-slave-delay will again enforce the pre-defined replication delay of X days.