Flite Culture

Some Thoughts on the MySQL REPLACE Command

In yesterday’s post I was fairly prescriptive about how to do an upsert in MySQL. I’m aware that there are other ways to do an upsert in MySQL, but I intentionally emphasized INSERT ... ON DUPLICATE KEY UPDATE over REPLACE because I think it’s almost always the better choice.

Here are my thoughts on REPLACE. I’ll start with the good stuff.

The appeal of REPLACE

REPLACE is a natural complement to INSERT and INSERT IGNORE. A plain INSERT gives you expected SQL INSERT behavior. If you try to insert a duplicate row you get an error. For example:

1
2
3
4
5
6
7
8
9
10
11
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

mysql> insert into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-30 14:11:36');
ERROR 1062 (23000): Duplicate entry '50' for key 'PRIMARY'

INSERT IGNORE is a “first write wins” implementation. The syntax is exactly the same as INSERT, you just add the IGNORE keyword. If you try to insert a duplicate row the insert is simply ignored, but without an error. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

mysql> insert ignore into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-30 14:11:36');
Query OK, 0 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

REPLACE is a “last write wins” implementation. The syntax is exactly the same as INSERT, you just use the REPLACE command instead of INSERT. If you try to insert a duplicate row the existing row is replaced by the new row. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mysql> set foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-30 14:11:36');
Query OK, 2 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | SMITH     | 2013-09-30 14:11:36 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

The downside of REPLACE

The interface of REPLACE has its advantages, but its implementation can cause problems. REPLACE is implemented as a DELETE followed by an INSERT. One disadvantage of that implementation is its performance. In every scenario I can imagine in MySQL it is faster to update an existing row than to delete one row and insert another one.

Beyond the performance implications, executing a DELETE followed by an INSERT can cause more problems when foreign key constraints are involved. For example if I try yesterday’s upsert using REPLACE I get an error:

1
2
3
mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (sakila.film_actor, CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id) REFERENCES actor (actor_id) ON UPDATE CASCADE)

The error occurred because there are child rows in the film_actor table that reference the row in the actor table that I am updating, and the delete is disallowed by the foreign key. As a workaround, I can temporarily disable foreign key checks before executing my REPLACE command:

1
2
3
4
5
6
mysql> set foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
Query OK, 2 rows affected (0.01 sec)

That workaround is fine for this particular case, in part because the relevant foreign key is defined with ON DELETE RESTRICT, but what about foreign keys defined with ON DELETE CASCADE? For those REPLACE is even more problematic. Here’s a quick schema change to allow me to test it:

1
2
3
4
5
6
7
8
9
mysql> alter table sakila.film_actor 
    ->   drop foreign key fk_film_actor_actor;
Query OK, 5462 rows affected (0.11 sec)
Records: 5462  Duplicates: 0  Warnings: 0

mysql> alter table sakila.film_actor 
    ->   add constraint fk_film_actor_actor foriegn key (actor_id) references actor (actor_id) on delete cascade on update cascade;
Query OK, 5462 rows affected (0.11 sec)
Records: 5462  Duplicates: 0  Warnings: 0

Now I’ll run my REPLACE command again, this time counting the relevant rows in the child table before and after the REPLACE:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> select count(*) 
    -> from sakila.film_actor
    -> where actor_id = 50;
+----------+
| count(*) |
+----------+
|       32 |
+----------+
1 row in set (0.00 sec)

mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
Query OK, 2 rows affected (0.00 sec)

mysql> select count(*) 
    -> from sakila.film_actor
    -> where actor_id = 50;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Oops. All I wanted to do was update one row in the actor table, and I ended up inadvertantly deleting 32 related rows from the film_actor table!

I can get into similar trouble when using REPLACE on a table with a DELETE trigger, since the REPLACE will execute an implicit DELETE and thereby fire the relevant trigger(s).

Since REPLACE can be used with any unique index (not just primary keys), it can also cause the auto-increment primary key value to change for a given row. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
mysql> create table user_account (
    ->   id int auto_increment primary key,
    ->   username varchar(50) not null,
    ->   unique key username (username)
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> replace into user_account (username) values ('ike');
Query OK, 1 row affected (0.00 sec)

mysql> select * from user_account;
+----+----------+
| id | username |
+----+----------+
|  1 | ike      |
+----+----------+
1 row in set (0.00 sec)

mysql> replace into user_account (username) values ('ike');
Query OK, 2 rows affected (0.00 sec)

mysql> select * from user_account;
+----+----------+
| id | username |
+----+----------+
|  2 | ike      |
+----+----------+
1 row in set (0.00 sec)

In summary, given its implementation I think it is usually not a good idea to use the MySQL REPLACE syntax. The one exception for me is when loading multiple rows from a flat file into a MySQL OLAP database using LOAD DATA INFILE. In that case I am bulk-loading data into MySQL from an external ETL process and I truly want a “last write wins” command, so I use LOAD DATA INFILE ... REPLACE. This use case works because none of the downsides I mentioned above are relevant. There are no foreign keys referencing my target table, no triggers, and I am not using auto-increment columns.

Comments

Comments