Flite Careers

Finding and Removing Duplicate Indexes in MySQL With Common_schema

Recently I checked a MySQL database for duplicate indexes using common_schema, and I want to share my approach. Before I get into the details, I want to define what I mean by duplicate indexes. In the past I’ve used the terms duplicate index and redundant index somewhat interchangeably, so now that I have my own definitions straight I want to write them down:

  • A duplicate index is an index of the same type (BTREE, HASH, FULLTEXT, etc) that includes the same column(s) in the same order as another index. For example KEY name2 (name) is a duplicate of KEY name1 (name), and KEY (name) is a duplicate of UNIQUE KEY (name).
  • A redundant index is an index of the same type that includes a prefix of another index. For example if a table has both KEY (name) and KEY (name,age) then the former is redundant.

There are some legitimate use cases for redundant indexes, so although most redundant indexes should be dropped I would not recommend going in and dropping them without some thoughtful consideration. In my case I just wanted to find and remove duplicate indexes, so I’ll ignore redundant indexes for now.

Finding duplicate indexes

common_schema has a view called redundant_keys which lists both duplicate and redundant indexes. It only includes BTREE indexes, so if you are looking for duplicate indexes of other types such as FULLTEXT you may want to also look at pt-duplicate-key-checker. By default common_schema.redundant_keys will list all duplicate and redundant indexes across all schemas. Here’s the simplest way to use it:

1
2
select *
from common_schema.redundant_keys\G

In my case I am only looking for duplicate indexes in a schema named “flite”. Since common_schema.redundant_keys is implemented as a MySQL view I can adjust the criteria using the WHERE clause. Here’s the query I ran to find the duplicate indexes:

1
2
3
4
5
6
select table_schema,table_name,redundant_index_name,redundant_index_columns,
  dominant_index_name
from common_schema.redundant_keys 
where table_schema = 'flite' 
  and redundant_index_columns = dominant_index_columns 
order by table_schema,table_name,redundant_index_name,dominant_index_name;

Removing duplicate indexes

It’s usually straighforward to build DDL statements to remove the duplicate indexes, but as a convenience common_schema.redundant_keys includes the relevant DDL statement in a column named sql_drop_index. When I select that column I use DISTINCT to avoid getting multiple occurences of the same DDL, since an index can be a duplicate of multiple other indexes. Here’s my query:

1
2
3
4
5
select distinct concat(sql_drop_index,';') as ddl
from common_schema.redundant_keys 
where table_schema = 'flite' 
and redundant_index_columns = dominant_index_columns 
order by table_schema,table_name,redundant_index_name;

I recommend reviewing the full list of duplicate indexes and the relevant table definitions before executing the DDL to make sure you’re not dropping anything you need. Another step I take is to check my application code for any index hints that reference a duplicate index by name. For example if key1 and key2 are duplicate indexes, and both of them are referenced by index hints in code, then I should standardize the index hints to reference the index I am keeping (key1) before I drop the duplicate index (key2).

Final thoughts

I started this project using common_schema, but when I discovered that it does not check FULLTEXT indexes I decided to try and accomplish the same thing with pt-duplicate-key-checker. I created a test table with duplicate FULLTEXT indexes and pt-duplicate-key-checker successfully identified the duplicate index, but I still ended up using common_schema because I could limit my queries to duplicate indexes and ignore redundant indexes. If there is a way to ignore redundant indexes with pt-duplicate-key-checker please let me know in the comments, because I couldn’t find it in the documentation.


Comments