Flite Culture

How to Do an Upsert in MySQL

The upsert (or merge) is a useful statement in SQL, but the syntax in MySQL is not self evident. If you search the MySQL manual for “upsert” you’ll probably get this page. If you search for “merge” you’ll probably get the page on index merge optimization or the merge storage engine. None of those tell you how to do an upsert. Here’s how I do it…

Say I want to execute an insert like this:

1
2
insert into sakila.actor (actor_id, first_name, last_name, last_update) 
values (50,'NATALIE','SMITH','2013-09-27 12:34:56');

But I don’t know if a row with actor_id=50 already exists or not. If the row doesn’t exist I want to insert it. If the row does exist I want to update it. I can do that using the INSERT ... on duplicate key update syntax, like so:

1
2
3
4
5
6
insert into sakila.actor (actor_id, first_name, last_name, last_update) 
values (50,'NATALIE','SMITH','2013-09-27 12:34:56')
on duplicate key update
  first_name = values(first_name),
  last_name = values(last_name),
  last_update = values(last_update);

Here’s how that statement modifies the existing row:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.01 sec)

mysql> insert into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56')
    -> on duplicate key update
    ->   first_name = values(first_name),
    ->   last_name = values(last_name),
    ->   last_update = values(last_update);
Query OK, 2 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | SMITH     | 2013-09-27 12:34:56 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

The update worked. I changed Natalie’s last name from ‘HOPKINS’ to ‘SMITH’, and updated the last_update timestamp, but why does it say “2 rows affected”? In truth only 1 row was affected, but the “2 rows affected” is a convention so you can tell if it was an insert or an update. As stated in the manual:

With on duplicate key update, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

There are a couple of things to note about the syntax itself. First, you may have never seen the values() function before. That function allows us to access the new value from the insert statement, and differentiate it from the existing row value for a given column.

Second, the syntax is somewhat tedious. I had to explicitly name and set all of the non-primary key columns not only in the INSERT clause, but again in the UPDATE clause. The silver lining of that tedious syntax is that it’s flexible enough to allow me to set the values however I want. If the row already exists I don’t need to update it with the values from my INSERT clause. Instead I can use a constant, or some calculated value, or even a combination of the old value and the new value. Here’s an example:

1
2
3
4
5
6
insert into sakila.actor (actor_id, first_name, last_name, last_update) 
values (50,'NATALIE','SMITH','2013-09-27 12:34:56')
on duplicate key update
  first_name = values(first_name),
  last_name = concat(values(last_name),' née ', last_name),
  last_update = now();

Here’s how that statement modifies the existing row:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 09:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

mysql> insert into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56')
    -> on duplicate key update
    ->   first_name = values(first_name),
    ->   last_name = concat(values(last_name),' née ', last_name),
    ->   last_update = now();
Query OK, 2 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+--------------------+---------------------+
| actor_id | first_name | last_name          | last_update         |
+----------+------------+--------------------+---------------------+
|       50 | NATALIE    | SMITH née HOPKINS  | 2013-09-30 12:29:22 |
+----------+------------+--------------------+---------------------+
1 row in set (0.00 sec)

Just like the previous example I updated Natalie’s last name, but this time I combined the new last name with the old last name, and used a more accurate value for last_update.

So that’s how you do an upsert in MySQL. Some people use REPLACE to do an upsert in MySQL, but I think that’s almost always a bad idea. I’ll post about that in more detail tomorrow.

Comments

Comments