Flite Careers

Updating Millions of Rows in MySQL With Common_schema

Last week I had to update several million rows in a single MySQL table. Rather that executing the update directly in the database, I decided to use common_schema’s split() function.

There are two reasons I chose to use common_schema for this task:

Chunking

Chunking breaks a single update statement into multiple statements, each updating at most a certain specified number of rows. The default chunk size is 1000, and I changed it to 2000 by setting the size parameter.

Throttling

Throttling has two benefits: it minimizes the amount of load on the master host, and it minimizes the replication lag introduced when the update statements are executed on the replicas. It achieves this by introducing sleep statements after each chunk is updated. The duration of the sleep statement is proportional to the runtime of the update statement. After experimenting with several different throttle values in a test environment, I decided on a value of 4, which means it will sleep for 4 times as long as each update takes. Thus if an update takes 100 milliseconds, it will sleep for 400 milliseconds before executing the next update. Using throttle does not guarantee that you will completely avoid replication lag, but if you do some testing you should be able to find an appropriate value that will strike a balance between the total run time of your updates and the level of replication lag that it introduces.

If you want to learn more about throttling in common_schema you can read the documentation.

Here’s the statement I executed in common_schema. I modified the SQL a bit for this post in order to make it more readable. The SELECT I do on the line below the throttle executes after each chunk and outputs a running total of how many rows have been updated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
set @script := "
  split({size:2000} :
update ad_parameter
set scope = -1
where
(
  name in (
  'parameter_name_a',
  'parameter_name_b',
  'parameter_name_c'
  )
  or name like 'parameter\_prefix\_x\_%'
  or name like 'parameter\_prefix\_y\_%'
  or name like 'parameter\_prefix\_z\_%'
)
and scope != -1
  )
{
  throttle 4;
  SELECT $split_total_rowcount AS 'rows updated so far';
}
";

call common_schema.run(@script);

Comments