Flite Culture

Testing the MySQL JSON Labs Release Using MySQL Sandbox

The MySQL 5.7.7 release candidate has been available for several months, but it doesn’t include the new JSON datatype or built-in JSON functions. Those are currently only available in the MySQL JSON Labs Release. Unlike the regular 5.7.7 release, the MySQL JSON Labs Release is only available in two download formats:

In order to try out the new JSON data type and functions on my Mac laptop, I need to build it from source.

Read on to see how I did that with the help of MySQL Sandbox.

Read on ✈
Comments

Using MySQL Sandbox for Upgrade Testing

Recently I tested several different MySQL upgrade scenarios for the blog posts I wrote about the temporal format upgrade. This required me to create a bunch of tables with datetime/timestamp/time columns in MySQL 5.5 and then physically copy those tables to a MySQL 5.6 or 5.7 database. I created 4 copies of each table to test out InnoDB versus MyISAM and partitioned tables versus non-partitioned tables. MySQL Sandbox made the process easy.

These are the MySQL versions I wanted to test:

  • 5.5.34
  • 5.6.15
  • 5.6.22
  • 5.6.25
  • 5.7.7

I was able to download the latest 5.6 and 5.7 releases here and the older releases here.

Read on for an overview of the MySQL Sandbox commands I used.

Read on ✈
Comments

The Downside of the MySQL 5.6 Implicit Temporal Column Format Upgrade

I’ve written two separate posts about the MySQL 5.6 temporal column format change, but I haven’t yet addressed the problem with the way this upgrade was implemented. Essentially, the problem is that the implicit format upgrade causes a write-blocking full table rebuild the first time you run ALTER TABLE after upgrading, even if the ALTER TABLE command is an online DDL operation that normally doesn’t block writes.

This behavior is well intentioned, but harmful. The intent is for the server to tack on a seemingly innocuous column format upgrade to my existing ALTER TABLE command, but when it changes my DDL from non-blocking to blocking it undermines online DDL and becomes problematic.

Read on for detailed examples and workarounds.

Read on ✈
Comments

Identifying Temporal Columns in Old Format in MySQL 5.6 and Above

I’ve gotten a lot of positive feedback from my post on upgrading temporal columns from MySQL 5.5 to MySQL 5.6 format, both in the blog comments and directly from people who have used my query to identify which columns are candidates to be upgraded to the new format. Based on the feedback and some changes in more recent MySQL releases, I think it’s worth doing a follow-up post.

Partioned tables

It was pointed out to me that the SQL query I used in my previous post does not work for partitioned tables. The reason is the join between information_schema.tables and information_schema.innodb_sys_tables in my query. For non-partitioned tables the value in information_schema.innodb_sys_tables.name will be <table_schema>/<table_name>, but for partitioned tables there will be one row in information_schema.innodb_sys_tables for each partition, with the naming pattern of <table_schema>/<table_name>#P#<partition_name>. Thus I need to update my query to support both naming patterns.

When I started testing the MySQL 5.5 to 5.6 upgrade on partitioned tables I was curious whether a table could have some partitions using the old format and some using the new format, and whether I could convert the columns one partition at a time. Based on my testing it appears that a given table will always use the same temporal column format for all partitions, and in order to upgrade the format of those columns you need to rebuild the entire table. If I upgrade a partitioned table with datetime/timestamp/time columns from 5.5 to 5.6 those columns will still be in the old format for all partitions. If I add new partitions to that table the datetime/timestamp/time columns in the new partitions will be in the old format. Partition-level maintenance operations such as REBULD PARTITION, COALESCE PARTITION, and REORGANIZE PARTITION do not upgrade the temporal column format, so I still need to use a table-level operation such as ALTER TABLE...FORCE or OPTIMIZE PARTITION (which is a misnomer because it actually rebuilds the entire table).

Read on ✈
Comments

Hash-based Workarounds for MySQL Unique Constraint Limitations

In the past I have written about MySQL’s limitations that prevent you from creating an index over a certain length, or with more than a certain number of columns. I’ve offered some solutions for those problems, but many of my solutions hit another limit further down the line. For example innodb_large_prefix allows you to create an index on larger columns, but only up to 3072 bytes, and my workarounds for the 16 column limit per index will also hit that 3072 byte limit once you add lots of columns or add some very long columns.

Today I’m going to suggest a hash-based solution to bypass that 3072 byte limit to solve a specific subset of index use cases. The basic idea is to build a string by concatenating all of the columns we want to include in the index, and create a unique index on the hash of that string, rather than the string itself. The hash will be a fixed length regardless of the input columns, so we don’t need to worry about the 3072 byte limit or the 16 column limit.

For example instead of building a unique index on this:

1
2
concat(c1,'-',c2,'-',c3,'-',c4,'-',c5,'-',c6,'-',c7,'-',c8,'-',c9,'-',c10,'-',
c11,'-',c12,'-',c13,'-',c14,'-',c15,'-',c16,'-',c17,'-',c18,'-',c19,'-',c20)

I can build a unique index on this:

1
2
unhex(sha2(concat(c1,'-',c2,'-',c3,'-',c4,'-',c5,'-',c6,'-',c7,'-',c8,'-',c9,'-',c10,'-',
c11,'-',c12,'-',c13,'-',c14,'-',c15,'-',c16,'-',c17,'-',c18,'-',c19,'-',c20),256))

An index on a hash value is not useful for most queries, since it can’t speed up range scans or queries using an index prefix, but it will work for unique constraints, and you can also use it for single row lookups if you apply the same hash function to the values you are searching for.

Hat tip to Justin Swanhart for giving me the idea to use hashes for this.

Read on for details and code examples.

Read on ✈
Comments

Groupwise Most Frequent Value Query in Hive

I am used to writing SQL queries and making small changes to accomodate Hive’s limitations, but sometimes I encounter a query that takes a little more work to implement in Hive. Recently I needed to write a query to calculate the most frequently occurring value by key. MySQL and Hive both lack a MODE() function, but there are several ways to accomplish this in MySQL. I usually execute two identical sub-queries to group by key and value, and use an outer join to exclude all of the key/value pairs that are not the more frequently occurring.

Read on for details on the MySQL query and how I needed to modify it to make it work in Hive.

Read on ✈
Comments

Adding a Unique Constraint With More Than 16 Columns in MariaDB

When I started writing this post I planned to follow up on my series of posts by creating a unique constraint with more than 16 columns using a MariaDB virtual column the same way I used a MySQL generated column in my most recent post. During my testing I abandoned that plan when I discovered two things:

  1. MariaDB virtual columns impose a 252 character limit on the expression that defines the column. This works for concatenating lots of columns with very short names like I did in my last post, but in the real world it’s easy to find an example where column names are long enough that a concatenate expression involving more than 16 columns is longer than 252 characters.
  2. MariaDB doesn’t have the same 16 column limit on indexes; instead it imposes a limit of 32 columns. Thus I can add a unique constraint on 17-32 columns in MariaDB without having to do anything special.

So I can’t use MariaDB virtual columns as a workaround to add a unique constraint on more than 16 columns, but it probably doesn’t matter because I actually don’t need that workaround as long as my unique constraint includes no more than 32 columns.

Read on for code examples.

Read on ✈
Comments

Using MySQL 5.7 Generated Columns to Avoid ERROR 1070

When I first blogged about a potential workaround for MySQL error 1070, I received a suggestion in the comments that my workaround might be a useful trick with MariaDB virtual columns or MySQL 5.7 generated columns. Although I’ve already blogged about a simplified version of my original workaround, the arrival of the MySQL 5.7.7 release candidate inspired me to try an even simpler implementation using generated columns.

Read on for my findings.

Read on ✈
Comments

Speaking at Percona Live 2015

A week from today I am speaking at Percona Live: MySQL Conference and Expo 2015.

Here is my talk:

My talk is for anyone who wants to make their life easier using MySQL tools. Rather than doing a deep dive on one or two tools, I will describe the methodology we’ve used at Flite to successfully integrate many different MySQL tools into our day-to-day workflow. The talk will take place Wednesday, April 15 at 3:30 PM in Ballroom E.

I most recently gave this talk at the San Francisco MySQL Meetup in February. If you are interested in my talk but won’t be at Percona Live, you can find the video online here.

See you in Santa Clara next week!

Comments

Introducing Pt-reanimate

Most of the time I like pt-kill, but sometimes it gets a little overzealous and kills a query that I need. That’s why I’m glad Percona created pt-reanimate so I can bring those important queries back to life. Of course the queries are not exactly the same when they come back, but at least they come back.

Here’s an example of pt-reanimate in action. First I add a column to a large table:

1
alter table big_table add column new_col int;

Here is my DDL in the processlist table:

1
2
3
4
5
+---------+------+----------------+----------------------------------------------+
| command | time | state          | info                                         |
+---------+------+----------------+----------------------------------------------+
| Query   |  146 | altering table | alter table big_table add column new_col int |
+---------+------+----------------+----------------------------------------------+

Since I’m using MySQL 5.6 this is an online operation, and it does not block reads or writes on the table. Unfortunately pt-kill doesn’t know that, and due to the long runtime of my DDL pt-kill killed the query, and it disappeared from the processlist.

Luckily I am also running pt-reanimate, and it brought my query back to life! After pt-reanimate recovered my query, this is what it looked like in the processlist table:

1
2
3
4
5
+---------+------+--------+----------------------------------------------+
| command | time | state  | info                                         |
+---------+------+--------+----------------------------------------------+
| Zombie  |  187 | brains | alter table big_table add column new_col int |
+---------+------+--------+----------------------------------------------+

The good news is the query eventually completed successfully. The bad news is it tried to eat some of my other queries, but they were all fast-running queries so they escaped unharmed.

Comments