Flite Culture

Finding and Deleting Orphaned Rows in MySQL

If you store data with a parent-child relationship in MySQL tables, then you may want to identify and/or delete orphaned rows from the child table. An orphaned row is defined as a child row that references a non-existant parent row. If you enforce the parent-child relationship with a foreign key constraint on the child table then orphans are less likely to occur, but you can still end up with orphans; for example if a user disables foreign_key_checks in a session and deletes a parent row. If you choose not to create foreign keys in your database in order to avoid the performance overhead, then you are at higher risk of having orphaned rows.

Read on ✈
Comments

Connecting VPCs With IPSec and StrongSwan

For many organizations on AWS it is important to deploy applications to multiple regions. It insulates from region-specific outages, and allows traffic to be served closer to the consumer.

A problem emerges however, when applications in a VPC must communicate with applications in another VPC from a different region. At the time of writing, there is no VPC-peering between regions. This means that one or both of the applications must be accessible from the Internet; a condition with which some teams may not want to comply.

Read on ✈
Comments

Dashboards at Flite

We love data here at Flite. Last year the Cosmos team launched the Report Dashboard, which is an awesome tool that allows users to get a quick overview of their organization’s performance. Author and dashboard expert Stephen Few defines a dashboard as “a visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so the information can be monitored at a glance”. Flite Report Dashboard is packed with lots of insightful metrics and visualizations that can help our customers do things like determine if ads have been underperforming, track campaigns that are under-pacing, or show which videos are most engaging. The dashboard can also be customized to focus on the key metrics that are the most important to our customers.

Read on ✈
Comments

Preparing Your MySQL Schema for a Character Set Change

Recently I blogged about identifying necessary index changes when changing the character set for a MySQL database. In this follow-up post I want to talk about general schema changes that you should consider when changing your character set from latin1 to utf8, from utf8 to utf8mb4, etc. The main issue is with long VARCHAR columns and columns with columns with length limits that are enforeced in bytes rather than characters.

The manual points out the main areas where you may have trouble:

  • A TINYTEXT column can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. Suppose that you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. You cannot convert it to utf8mb4 unless you also change the data type to a longer type such as TEXT.

  • Similarly, a very long VARCHAR column may need to be changed to one of the longer TEXT types if you want to convert it from utf8 to utf8mb4.

The point on TINYTEXT is pretty clear. Frankly I’m not sure why anyone uses TINYTEXT to begin with, but if you have TINYTEXT columns in your database and you are changing your character set, this might be a great opportunity to change those columns to a TEXT type that supports longer values, such as MEDIUMTEXT, TEXT, or LONGTEXT.

Long VARCHAR columns can be a problem because MySQL has a row length limit of 65,535 bytes, and one really long VARCHAR or a few relatively long VARCHARs in a multi-byte character set can push your rows past that limit. For example you can have a VARCHAR(20000) in utf8 because it takes up a maximum of 60,000 bytes, but you can’t have a VARCHAR(20000) in utf8mb4 because that requires up to 80,000 bytes of storage.

Here’s my simple recommendation for preparing a MySQL schema for a character set change:

  1. Change all TINYTEXT columns to TEXT, MEDIUMTEXT, or LONGTEXT
  2. Change all long VARCHAR columns to TEXT, MEDIUMTEXT, or LONGTEXT

Read on for sample queries that generate the necessary DDL to modify the relevant columns, and examples that illustrate the problem.

Read on ✈
Comments

Prepping Your MySQL Indexes for a Character Set Change

When changing a MySQL table or column to a new character set that uses more bytes than the old character set, you need to first check if any schema changes are needed to accomodate the change. For example, changing character sets from latin1 to utf8 is an increase from 1 to 3 bytes, and changing from utf8 to utf8mb4 is an increase from 3 to 4 bytes. The MySQL reference manual has a helpful page with some details on this, but I want to add some examples to show how this schema prep can be accomplished.

There are three different types of length limits to take into consideration:

  • Index
  • Column
  • Row

In this post I will focus on index length limits, and I’ll save columns and rows for future posts. Read on for details.

Read on ✈
Comments

Testing the MySQL JSON Labs Release Using MySQL Sandbox

The MySQL 5.7.7 release candidate has been available for several months, but it doesn’t include the new JSON datatype or built-in JSON functions. Those are currently only available in the MySQL JSON Labs Release. Unlike the regular 5.7.7 release, the MySQL JSON Labs Release is only available in two download formats:

In order to try out the new JSON data type and functions on my Mac laptop, I need to build it from source.

Read on to see how I did that with the help of MySQL Sandbox.

Read on ✈
Comments

Using MySQL Sandbox for Upgrade Testing

Recently I tested several different MySQL upgrade scenarios for the blog posts I wrote about the temporal format upgrade. This required me to create a bunch of tables with datetime/timestamp/time columns in MySQL 5.5 and then physically copy those tables to a MySQL 5.6 or 5.7 database. I created 4 copies of each table to test out InnoDB versus MyISAM and partitioned tables versus non-partitioned tables. MySQL Sandbox made the process easy.

These are the MySQL versions I wanted to test:

  • 5.5.34
  • 5.6.15
  • 5.6.22
  • 5.6.25
  • 5.7.7

I was able to download the latest 5.6 and 5.7 releases here and the older releases here.

Read on for an overview of the MySQL Sandbox commands I used.

Read on ✈
Comments

The Downside of the MySQL 5.6 Implicit Temporal Column Format Upgrade

I’ve written two separate posts about the MySQL 5.6 temporal column format change, but I haven’t yet addressed the problem with the way this upgrade was implemented. Essentially, the problem is that the implicit format upgrade causes a write-blocking full table rebuild the first time you run ALTER TABLE after upgrading, even if the ALTER TABLE command is an online DDL operation that normally doesn’t block writes.

This behavior is well intentioned, but harmful. The intent is for the server to tack on a seemingly innocuous column format upgrade to my existing ALTER TABLE command, but when it changes my DDL from non-blocking to blocking it undermines online DDL and becomes problematic.

Read on for detailed examples and workarounds.

Read on ✈
Comments

Identifying Temporal Columns in Old Format in MySQL 5.6 and Above

I’ve gotten a lot of positive feedback from my post on upgrading temporal columns from MySQL 5.5 to MySQL 5.6 format, both in the blog comments and directly from people who have used my query to identify which columns are candidates to be upgraded to the new format. Based on the feedback and some changes in more recent MySQL releases, I think it’s worth doing a follow-up post.

Partioned tables

It was pointed out to me that the SQL query I used in my previous post does not work for partitioned tables. The reason is the join between information_schema.tables and information_schema.innodb_sys_tables in my query. For non-partitioned tables the value in information_schema.innodb_sys_tables.name will be <table_schema>/<table_name>, but for partitioned tables there will be one row in information_schema.innodb_sys_tables for each partition, with the naming pattern of <table_schema>/<table_name>#P#<partition_name>. Thus I need to update my query to support both naming patterns.

When I started testing the MySQL 5.5 to 5.6 upgrade on partitioned tables I was curious whether a table could have some partitions using the old format and some using the new format, and whether I could convert the columns one partition at a time. Based on my testing it appears that a given table will always use the same temporal column format for all partitions, and in order to upgrade the format of those columns you need to rebuild the entire table. If I upgrade a partitioned table with datetime/timestamp/time columns from 5.5 to 5.6 those columns will still be in the old format for all partitions. If I add new partitions to that table the datetime/timestamp/time columns in the new partitions will be in the old format. Partition-level maintenance operations such as REBULD PARTITION, COALESCE PARTITION, and REORGANIZE PARTITION do not upgrade the temporal column format, so I still need to use a table-level operation such as ALTER TABLE...FORCE or OPTIMIZE PARTITION (which is a misnomer because it actually rebuilds the entire table).

Read on ✈
Comments

Hash-based Workarounds for MySQL Unique Constraint Limitations

In the past I have written about MySQL’s limitations that prevent you from creating an index over a certain length, or with more than a certain number of columns. I’ve offered some solutions for those problems, but many of my solutions hit another limit further down the line. For example innodb_large_prefix allows you to create an index on larger columns, but only up to 3072 bytes, and my workarounds for the 16 column limit per index will also hit that 3072 byte limit once you add lots of columns or add some very long columns.

Today I’m going to suggest a hash-based solution to bypass that 3072 byte limit to solve a specific subset of index use cases. The basic idea is to build a string by concatenating all of the columns we want to include in the index, and create a unique index on the hash of that string, rather than the string itself. The hash will be a fixed length regardless of the input columns, so we don’t need to worry about the 3072 byte limit or the 16 column limit.

For example instead of building a unique index on this:

1
2
concat(c1,'-',c2,'-',c3,'-',c4,'-',c5,'-',c6,'-',c7,'-',c8,'-',c9,'-',c10,'-',
c11,'-',c12,'-',c13,'-',c14,'-',c15,'-',c16,'-',c17,'-',c18,'-',c19,'-',c20)

I can build a unique index on this:

1
2
unhex(sha2(concat(c1,'-',c2,'-',c3,'-',c4,'-',c5,'-',c6,'-',c7,'-',c8,'-',c9,'-',c10,'-',
c11,'-',c12,'-',c13,'-',c14,'-',c15,'-',c16,'-',c17,'-',c18,'-',c19,'-',c20),256))

An index on a hash value is not useful for most queries, since it can’t speed up range scans or queries using an index prefix, but it will work for unique constraints, and you can also use it for single row lookups if you apply the same hash function to the values you are searching for.

Hat tip to Justin Swanhart for giving me the idea to use hashes for this.

Read on for details and code examples.

Read on ✈
Comments