Flite Culture

Preparing Your MySQL Schema for a Character Set Change

Recently I blogged about identifying necessary index changes when changing the character set for a MySQL database. In this follow-up post I want to talk about general schema changes that you should consider when changing your character set from latin1 to utf8, from utf8 to utf8mb4, etc. The main issue is with long VARCHAR columns and columns with columns with length limits that are enforeced in bytes rather than characters.

The manual points out the main areas where you may have trouble:

  • A TINYTEXT column can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. Suppose that you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. You cannot convert it to utf8mb4 unless you also change the data type to a longer type such as TEXT.

  • Similarly, a very long VARCHAR column may need to be changed to one of the longer TEXT types if you want to convert it from utf8 to utf8mb4.

The point on TINYTEXT is pretty clear. Frankly I’m not sure why anyone uses TINYTEXT to begin with, but if you have TINYTEXT columns in your database and you are changing your character set, this might be a great opportunity to change those columns to a TEXT type that supports longer values, such as MEDIUMTEXT, TEXT, or LONGTEXT.

Long VARCHAR columns can be a problem because MySQL has a row length limit of 65,535 bytes, and one really long VARCHAR or a few relatively long VARCHARs in a multi-byte character set can push your rows past that limit. For example you can have a VARCHAR(20000) in utf8 because it takes up a maximum of 60,000 bytes, but you can’t have a VARCHAR(20000) in utf8mb4 because that requires up to 80,000 bytes of storage.

Here’s my simple recommendation for preparing a MySQL schema for a character set change:

  1. Change all TINYTEXT columns to TEXT, MEDIUMTEXT, or LONGTEXT
  2. Change all long VARCHAR columns to TEXT, MEDIUMTEXT, or LONGTEXT

Read on for sample queries that generate the necessary DDL to modify the relevant columns, and examples that illustrate the problem.

Read on ✈

Prepping Your MySQL Indexes for a Character Set Change

When changing a MySQL table or column to a new character set that uses more bytes than the old character set, you need to first check if any schema changes are needed to accomodate the change. For example, changing character sets from latin1 to utf8 is an increase from 1 to 3 bytes, and changing from utf8 to utf8mb4 is an increase from 3 to 4 bytes. The MySQL reference manual has a helpful page with some details on this, but I want to add some examples to show how this schema prep can be accomplished.

There are three different types of length limits to take into consideration:

  • Index
  • Column
  • Row

In this post I will focus on index length limits, and I’ll save columns and rows for future posts. Read on for details.

Read on ✈

Testing the MySQL JSON Labs Release Using MySQL Sandbox

The MySQL 5.7.7 release candidate has been available for several months, but it doesn’t include the new JSON datatype or built-in JSON functions. Those are currently only available in the MySQL JSON Labs Release. Unlike the regular 5.7.7 release, the MySQL JSON Labs Release is only available in two download formats:

In order to try out the new JSON data type and functions on my Mac laptop, I need to build it from source.

Read on to see how I did that with the help of MySQL Sandbox.

Read on ✈

Using MySQL Sandbox for Upgrade Testing

Recently I tested several different MySQL upgrade scenarios for the blog posts I wrote about the temporal format upgrade. This required me to create a bunch of tables with datetime/timestamp/time columns in MySQL 5.5 and then physically copy those tables to a MySQL 5.6 or 5.7 database. I created 4 copies of each table to test out InnoDB versus MyISAM and partitioned tables versus non-partitioned tables. MySQL Sandbox made the process easy.

These are the MySQL versions I wanted to test:

  • 5.5.34
  • 5.6.15
  • 5.6.22
  • 5.6.25
  • 5.7.7

I was able to download the latest 5.6 and 5.7 releases here and the older releases here.

Read on for an overview of the MySQL Sandbox commands I used.

Read on ✈

The Downside of the MySQL 5.6 Implicit Temporal Column Format Upgrade

I’ve written two separate posts about the MySQL 5.6 temporal column format change, but I haven’t yet addressed the problem with the way this upgrade was implemented. Essentially, the problem is that the implicit format upgrade causes a write-blocking full table rebuild the first time you run ALTER TABLE after upgrading, even if the ALTER TABLE command is an online DDL operation that normally doesn’t block writes.

This behavior is well intentioned, but harmful. The intent is for the server to tack on a seemingly innocuous column format upgrade to my existing ALTER TABLE command, but when it changes my DDL from non-blocking to blocking it undermines online DDL and becomes problematic.

Read on for detailed examples and workarounds.

Read on ✈

Identifying Temporal Columns in Old Format in MySQL 5.6 and Above

I’ve gotten a lot of positive feedback from my post on upgrading temporal columns from MySQL 5.5 to MySQL 5.6 format, both in the blog comments and directly from people who have used my query to identify which columns are candidates to be upgraded to the new format. Based on the feedback and some changes in more recent MySQL releases, I think it’s worth doing a follow-up post.

Partioned tables

It was pointed out to me that the SQL query I used in my previous post does not work for partitioned tables. The reason is the join between information_schema.tables and information_schema.innodb_sys_tables in my query. For non-partitioned tables the value in information_schema.innodb_sys_tables.name will be <table_schema>/<table_name>, but for partitioned tables there will be one row in information_schema.innodb_sys_tables for each partition, with the naming pattern of <table_schema>/<table_name>#P#<partition_name>. Thus I need to update my query to support both naming patterns.

When I started testing the MySQL 5.5 to 5.6 upgrade on partitioned tables I was curious whether a table could have some partitions using the old format and some using the new format, and whether I could convert the columns one partition at a time. Based on my testing it appears that a given table will always use the same temporal column format for all partitions, and in order to upgrade the format of those columns you need to rebuild the entire table. If I upgrade a partitioned table with datetime/timestamp/time columns from 5.5 to 5.6 those columns will still be in the old format for all partitions. If I add new partitions to that table the datetime/timestamp/time columns in the new partitions will be in the old format. Partition-level maintenance operations such as REBULD PARTITION, COALESCE PARTITION, and REORGANIZE PARTITION do not upgrade the temporal column format, so I still need to use a table-level operation such as ALTER TABLE...FORCE or OPTIMIZE PARTITION (which is a misnomer because it actually rebuilds the entire table).

Read on ✈

Hash-based Workarounds for MySQL Unique Constraint Limitations

In the past I have written about MySQL’s limitations that prevent you from creating an index over a certain length, or with more than a certain number of columns. I’ve offered some solutions for those problems, but many of my solutions hit another limit further down the line. For example innodb_large_prefix allows you to create an index on larger columns, but only up to 3072 bytes, and my workarounds for the 16 column limit per index will also hit that 3072 byte limit once you add lots of columns or add some very long columns.

Today I’m going to suggest a hash-based solution to bypass that 3072 byte limit to solve a specific subset of index use cases. The basic idea is to build a string by concatenating all of the columns we want to include in the index, and create a unique index on the hash of that string, rather than the string itself. The hash will be a fixed length regardless of the input columns, so we don’t need to worry about the 3072 byte limit or the 16 column limit.

For example instead of building a unique index on this:


I can build a unique index on this:


An index on a hash value is not useful for most queries, since it can’t speed up range scans or queries using an index prefix, but it will work for unique constraints, and you can also use it for single row lookups if you apply the same hash function to the values you are searching for.

Hat tip to Justin Swanhart for giving me the idea to use hashes for this.

Read on for details and code examples.

Read on ✈

Groupwise Most Frequent Value Query in Hive

I am used to writing SQL queries and making small changes to accomodate Hive’s limitations, but sometimes I encounter a query that takes a little more work to implement in Hive. Recently I needed to write a query to calculate the most frequently occurring value by key. MySQL and Hive both lack a MODE() function, but there are several ways to accomplish this in MySQL. I usually execute two identical sub-queries to group by key and value, and use an outer join to exclude all of the key/value pairs that are not the more frequently occurring.

Read on for details on the MySQL query and how I needed to modify it to make it work in Hive.

Read on ✈

Adding a Unique Constraint With More Than 16 Columns in MariaDB

When I started writing this post I planned to follow up on my series of posts by creating a unique constraint with more than 16 columns using a MariaDB virtual column the same way I used a MySQL generated column in my most recent post. During my testing I abandoned that plan when I discovered two things:

  1. MariaDB virtual columns impose a 252 character limit on the expression that defines the column. This works for concatenating lots of columns with very short names like I did in my last post, but in the real world it’s easy to find an example where column names are long enough that a concatenate expression involving more than 16 columns is longer than 252 characters.
  2. MariaDB doesn’t have the same 16 column limit on indexes; instead it imposes a limit of 32 columns. Thus I can add a unique constraint on 17-32 columns in MariaDB without having to do anything special.

So I can’t use MariaDB virtual columns as a workaround to add a unique constraint on more than 16 columns, but it probably doesn’t matter because I actually don’t need that workaround as long as my unique constraint includes no more than 32 columns.

Read on for code examples.

Read on ✈

Using MySQL 5.7 Generated Columns to Avoid ERROR 1070

When I first blogged about a potential workaround for MySQL error 1070, I received a suggestion in the comments that my workaround might be a useful trick with MariaDB virtual columns or MySQL 5.7 generated columns. Although I’ve already blogged about a simplified version of my original workaround, the arrival of the MySQL 5.7.7 release candidate inspired me to try an even simpler implementation using generated columns.

Read on for my findings.

Read on ✈