Recently I blogged about identifying necessary index changes when changing the character set for a MySQL database. In this follow-up post I want to talk about general schema changes that you should consider when changing your character set from latin1 to utf8, from utf8 to utf8mb4, etc. The main issue is with long
VARCHAR columns and columns with columns with length limits that are enforeced in bytes rather than characters.
The manual points out the main areas where you may have trouble:
A TINYTEXT column can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. Suppose that you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. You cannot convert it to utf8mb4 unless you also change the data type to a longer type such as TEXT.
Similarly, a very long VARCHAR column may need to be changed to one of the longer TEXT types if you want to convert it from utf8 to utf8mb4.
The point on
TINYTEXT is pretty clear. Frankly I’m not sure why anyone uses
TINYTEXT to begin with, but if you have
TINYTEXT columns in your database and you are changing your character set, this might be a great opportunity to change those columns to a TEXT type that supports longer values, such as
VARCHAR columns can be a problem because MySQL has a row length limit of 65,535 bytes, and one really long
VARCHAR or a few relatively long VARCHARs in a multi-byte character set can push your rows past that limit. For example you can have a
VARCHAR(20000) in utf8 because it takes up a maximum of 60,000 bytes, but you can’t have a
VARCHAR(20000) in utf8mb4 because that requires up to 80,000 bytes of storage.
Here’s my simple recommendation for preparing a MySQL schema for a character set change:
- Change all TINYTEXT columns to TEXT, MEDIUMTEXT, or LONGTEXT
- Change all long VARCHAR columns to TEXT, MEDIUMTEXT, or LONGTEXT
Read on for sample queries that generate the necessary DDL to modify the relevant columns, and examples that illustrate the problem.
Read on ✈