Flite Culture

Executing Complex Deletes With Common_schema

Recently I started a project that required deleting millions of rows of data from various MySQL tables. Many of the queries which identifed the rows to delete required several joins. I decided to use common_schema’s split() function to break the deletes into chunks, but when I started running the queries in common_schema some of them failed to delete any data. I assume the failures were related to the complexity of the multi-table delete statements.

Thiking of this as a two step problem a natural workaround emerged. The two steps are:

  1. Identify the rows to delete
  2. Delete the rows

I want to use split() for step 2, but I don’t really need it for step 1.

I decided to create a new table to store the ids which define the rows to delete, then populate that table, and use a simpler multi-table delete statement to execute the deletes using split(). So far this workaround has allowed me to execute several separate delete statements, each deleting millions of rows in chunks.

Here’s an example. Say I want to execute a delete like this:

1
2
3
4
5
6
7
delete ad_tag.*
from ad_tag
  inner join ad on ad.id = ad_tag.ad_id
  inner join user_account on user_account.id = ad.user_id
  inner join organization on organization.id = ad.org_id
where user_account.service_level = 0
  or organization.service_level = 0;

I can delete the same rows using the two step approach discussed above as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
create table if not exists temp_rows_to_delete (
  ad_id int not null,
  primary key (ad_id)
) Engine=InnoDB;

insert ignore into temp_rows_to_delete (ad_id)
select ad.id
from ad
  inner join user_account on user_account.id = ad.user_id
  inner join organization on organization.id = ad.org_id
where user_account.service_level = 0
  or organization.service_level = 0;

set @script := "
  split(temp_rows_to_delete: delete ad_tag.*
from temp_rows_to_delete
inner join ad_tag on ad_tag.ad_id = temp_rows_to_delete.ad_id
  )
  SELECT $split_total_rowcount AS 'rows deleted so far';
";

call common_schema.run(@script);

drop table if exists temp_rows_to_delete;

Comments