Flite Careers

Adding a Unique Constraint With More Than 16 Columns in MySQL

A while ago I blogged about a potential workaround for MySQL error 1070 that could be used to add a unique constraint with more than 16 columns.

As a reminder here’s the error you get when you try to create a unique constraint with more than 16 columns in MySQL:

1
ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed

The solution I proposed should work, but when I started to implement it I made a couple of changes that merit this follow-up post. One change is fairly generic, and the other is specific to my use case.

The generic change was to only concatenate some of the columns, rather than all of them. Specifically I only need to concatenate N-15 columns. So if I want to have a unique constraint on 20 columns, I can include 15 of those actual columns and then concatenate the remaining 5 columns into a new column that is included as the 16th column in the unique constraint. When choosing which columns to concatenate, it makes sense to choose columns with short values, since that limits the amount of redundant data you are storing and helps prevent the table from taking up too much disk space.

As in my previous post, here’s a table with 20 columns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE IF NOT EXISTS lots_of_columns (
  c1 int not null,
  c2 int not null,
  c3 int not null,
  c4 int not null,
  c5 int not null,
  c6 int not null,
  c7 int not null,
  c8 int not null,
  c9 int not null,
  c10 int not null,
  c11 char(8) not null,
  c12 char(8) not null,
  c13 char(8) not null,
  c14 char(8) not null,
  c15 char(8) not null,
  c16 char(8) not null,
  c17 char(8) not null,
  c18 char(8) not null,
  c19 char(8) not null,
  c20 char(8) not null
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

I can include 15 of those columns in my unique constraint, and concatenate the other 5 together to make the 16th column in the index. Looking at the schema, it’s not immediately obvious which columns are the smallest. The INT columns can have values ranging from 1 to 11 characters long, while the CHAR columns can have values ranging from 1 to 8 characters long. Since the range of lengths is fairly small I could choose the 5 columns to concatenate arbitrarily, or if I have a representative data set I could run some queries using AVG(CHAR_LENGTH())) to empircally determine which columns are shortest. For the sake of simplicity in this example I’ll use columns c16 - c20.

Now I add a new column to store those 5 columns concatenated together. The column needs to be long enough to store the concatenated values. Since each CHAR column value can be up to 8 characters long, and the hyphens will account for 4 characters, the new column needs to store up to (5 * 8) + 4 = 44 characters.

Here’s the DDL to add the new column and the unique constraint:

1
2
3
4
alter table lots_of_columns 
  add column c16_thru_c20_concatenated varchar(44) not null default '',
  add unique index unique_constraint 
    (c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16_thru_c20_concatenated);

If the table has data in it I can populate the new concatenated column for all existing rows with an update statement:

1
2
3
4
5
6
7
update lots_of_columns
set c16_thru_c20_concatenated = concat(COALESCE(c16,''),'-',
  COALESCE(c17,''),'-',
  COALESCE(c18,''),'-',
  COALESCE(c19,''),'-',
  COALESCE(c20,''))
where c16_thru_c20_concatenated = '';

The update will fail if any existing data violates the new unique constraint.

Now that the existing data is updated, I need to make sure the concatenated column is populated properly going forward. In my previous post I used triggers to populate the concatenated column. In some ways that is the safest implementation, because it practically guarantees that the concatenated column will always be set regardless of how the table is written, but it adds some overhead to inserts and updates. In my specific use case I am writing tens of thousands of rows at a time to the table using LOAD DATA INFILE. Once written the rows are never updated, and there are no other code paths writing to the table, so for me it makes more sense to set the values in the LOAD DATA INFILE statement rather than using triggers.

Here’s an example to load the contents of a tab-delimited text file into my database table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
load data infile '/tmp/lots_of_columns.txt' 
ignore into table lots_of_columns 
(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,@c16,@c17,@c18,@c19,@c20)
set c16=@c16,
  c17=@c17,
  c18=@c18,
  c19=@c19,
  c20=@c20,
  c16_thru_c20_concatenated=concat(COALESCE(@c16,''),'-',
    COALESCE(@c17,''),'-',
    COALESCE(@c18,''),'-',
    COALESCE(@c19,''),'-',
    COALESCE(@c20,'')
  ) 

An alternative implementation could be to use virtual columns, as pointed out by some of the commenters on my previous post. I have not tried that approach yet – mostly because I don’t use MariaDB or MySQL 5.7 – but I may set up a sandbox instance to try that out soon.

Comments

Comments