Flite Culture

Helping Common_schema Help Me

I’m a big fan of common_schema. It’s a really powerful and flexible tool, and I’m always looking for new ways to use it.

Last week I had to update millions of rows across many databases to tokenize some persisted URL values, and I remembered reading Baron Schwartz’s recent blog post about using the common_schema split feature. Baron’s use case was deleting data, but I figured this could work well to break my large updates into chunks too. I had already written the update statements I wanted to execute, and after five minutes reading the common_schema documentation I was ready to try it out on a dev database.

One of the queries I tried didn’t work in common_schema, but with a creative workaround I was able to trick common_schema into accepting it. Read on for the details.

The first query I wanted to run involved two tables, so I needed to specify which table to split on. In this case I’m updating the parameter table, so I specify that as my split table first, then specify the full update statement, and after that I select $split_total_rowcount so I can see the progress of the update after each chunk is done.

Here’s the code:

1
2
3
4
5
6
7
8
9
10
set @script := "
  split(flite.parameter: update flite.registration
    inner join flite.parameter on parameter.registration_id = registration.id
    set parameter.value = replace(parameter.value,'http://flite.com','${urls.siteUrl}')
    where registration.type in (1,9)
    and parameter.name = 'asset_url'
  ) 
  SELECT $split_total_rowcount AS 'rows updated so far';
";
call common_schema.run(@script);

It worked on dev and QA so I tried it out on my staging db and it worked perfectly, updating over a million rows in about an hour. Here’s the end of the output:

1
2
3
4
5
6
7
8
9
...
+---------------------+
| rows updated so far |
+---------------------+
|             1195047 |
+---------------------+
1 row in set (1 hour 3 min 40.12 sec)

Query OK, 0 rows affected (1 hour 3 min 40.16 sec)

The second query was even simpler, just a single table update. But when I ran this one with common_schema no rows were updated:

1
2
3
4
5
6
7
8
9
10
11
mysql> set @script := "
    ">   split(update flite.registration
    "> set url = replace(url,'http://flite.com','${urls.siteUrl}')
    "> where type in (1,9)
    "> )     
    ">   SELECT $split_total_rowcount AS 'rows updated so far';
    "> ";
Query OK, 0 rows affected (0.00 sec)

mysql> call common_schema.run(@script);
Query OK, 0 rows affected (1.23 sec)

I guess common_schema failed trying to split the table, so I tried a no op call to verify it’s splitting approach, and that returned nothing, which validated my hypothesis:

1
2
3
4
5
6
7
8
9
mysql> call common_schema.run("
    ">   split(flite.registration) { 
    ">     select 
    ">       $split_step as step, $split_columns as columns, 
    ">       $split_min as min_value, $split_max as max_value, 
    ">       $split_range_start as range_start, $split_range_end as range_end
    ">   }
    "> ");
Query OK, 0 rows affected (0.33 sec)

The registration table has over 50 columns, and in addition to having an auto-increment primary key it also has other unique indexes, so of all my tables I’m not too surprised this one had a problem.

So now what do I do? I could submit a bug report for common_schema, or open up the code and try to fix it myself (and I’ll eventually do at least one of those two things), but I really wanted to make this work, so I decided to trick common_schema into doing it anyways. Since common_schema supports multi-table updates, I just rewrote my single table update as a gratuitous multi-table update so I could split on the second table.

First I tried joining to a parent table:

1
2
3
4
5
6
7
8
set @script := "
  split(flite.instance: update flite.registration
inner join flite.organization on organization.id = registration.organization_id
set registration.url = replace(registration.url,'http://flite.com','${urls.siteUrl}')
)
  SELECT $split_total_rowcount AS 'rows updated so far';
";
call common_schema.run(@script);

But when I ran that it didn’t return anything for a few minutes, which was not a good sign. Rather than troubleshoot that query, I decided to try another join table.

Next I tried joining to a child table, using a where clause that I knew would give me a 1-1 relationship with the table I was updating:

1
2
3
4
5
6
7
8
9
10
set @script := "
  split(flite.instance: update flite.registration
inner join flite.instance on instance.registration_id = registration.id
set registration.url = replace(registration.url,'http://flite.com','${urls.siteUrl}')
where registration.type in (1,9)
and instance.is_preview = true
)
  SELECT $split_total_rowcount AS 'rows updated so far';
";
call common_schema.run(@script);

This time it worked! Here’s the end of the output:

1
2
3
4
5
6
7
8
9
...
+---------------------+
| rows updated so far |
+---------------------+
|             1195725 |
+---------------------+
1 row in set (42 min 33.48 sec)

Query OK, 0 rows affected (42 min 33.52 sec)

Now I can go back and look at the code and/or submit a bug report, but I was pleased that with a little help common_schema solved my use case. This was a reminder to me of the power of a good tool, and the value of being creative in the way you use that tool.


Comments