Flite Culture

Avoiding Data Loss With Older Versions of Pt-table-sync

The pt-table-sync tool uses the MySQL REPLACE command to rewrite row data on a master database in order to assure consistency between the master and its replicas. The REPLACE command makes a lot of sense for that tool, but it also has the potential to cause problems. I’ve written about the downsides of the REPLACE command in the past, and sure enough in older versions (I was using 2.1.9) of pt-table-sync it’s possible to accidentally delete child rows because of its use of REPLACE. The fix is to run it with --no-foreign-key-checks, which is the default as of version 2.2.6 according to this bug report.

I’ll take this as a reminder to upgrade Percona Toolkit from 2.1.9 to 2.2.7, and to always test tools like this in a non-production database in order to avoid potential data loss in production.

Here’s a quick example to illustrate the problem. First I create two test tables, and insert one parent row and one child row on my master database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql> create table if not exists parent (
    -> id int unsigned not null auto_increment primary key, 
    -> name varchar(30)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> create table if not exists child (
    ->   id int unsigned not null auto_increment primary key, 
    ->   parent_id int unsigned, 
    ->   name varchar(30), 
    ->   foreign key fk_child_to_parent (parent_id) references parent(id) on delete cascade
    -> ) engine = innodb;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into parent values (1,'inserted');
Query OK, 1 row affected (0.01 sec)

mysql> insert into child values (1,1,'inserted');
Query OK, 1 row affected (0.01 sec)

mysql> select * from child;
+----+-----------+----------+
| id | parent_id | name     |
+----+-----------+----------+
|  1 |         1 | inserted |
+----+-----------+----------+
1 row in set (0.00 sec)

Then I update the parent row on the replica to introduce a data inconsistency:

1
2
3
mysql> update parent set name = 'updated' where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now I run pt-table-sync on the master to fix the data inconsistency:

1
2
3
4
5
6
7
$ pt-table-sync --execute \
> --sync-to-master \
> --verbose \
> h=replica1.flite.com,P=3306,F=/etc/pt-table-checksum.cnf,D=test,t=parent
# Syncing D=test,F=/etc/pt-table-checksum.cnf,P=3306,h=replica1.flite.com,t=parent
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       1      0      0 Chunk     17:03:26 17:03:26 2    test.parent

But unfortunately the REPLACE command deletes the parent row before re-inserting it, which triggers the cascade delete from th foreign key and deletes the child row from both the master database and the replica:

1
2
mysql> select * from child;
Empty set (0.00 sec)

If I ran pt-table-sync like this (using --no-foreign-key-checks) it would not have deleted the child row:

1
2
3
4
5
6
7
8
$ pt-table-sync --execute \
> --sync-to-master \
> --verbose \
> --no-foreign-key-checks \
> h=replica1.flite.com,P=3306,F=/etc/pt-table-checksum.cnf,D=test,t=parent
# Syncing D=test,F=/etc/pt-table-checksum.cnf,P=3306,h=replica1.flite.com,t=parent
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       1      0      0 Chunk     17:07:50 17:07:50 2    test.parent

Comments