Flite Culture

Faster JSON Parsing Using MySQL JSON UDFs

A while back I blogged about JSON parsing in MySQL using Common_schema. Given the number of page views that post continues to get I think it’s worth posting an update about how I solve that problem today. I now use the json_extract() function from MySQL JSON UDFs, which is available at MySQL Labs. There are multiple warnings on the download page that these UDFs should not be used on a production site, so consider yourself warned. In my case I’m running these reports on a read-only reporting replica of my production DB, so if they crash my server it wouldn’t have direct customer impact. I can live with the risk for now.

Why I made the switch

I made the switch because the common_schema functions I was using were too slow. That fact is well known, it’s even mentioned explicitly in the documentation:

NOTE: this function is CPU intensive. This solution should ideally be implemented through built-in functions, not stored routines.

common_schema was fast enough for my original use case, which only required parsing a couple of values from a few hundred rows of very small JSON objects. Over time I had to parse more values from more rows of larger JSON objects, and some of my reports started taking many hours to run. That’s when I installed the JSON UDFs and did some testing. In some of my tests the json_extract() function is 100 times as fast as common_schema.get_option(), so I’ve started switching my larger reports to use the JSON UDFs. Currently there are 9 MySQL JSON UDFs, but for now I am just using the json_extract() function. I’m looking forward to trying the other functions in the future.

Here’s an example of one of the queries I run (This function supports complex JSON structures, but for now my JSON objects are flat so the use case is pretty simple):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mysql> select json_extract(f.event_data,'state') as state,
    ->   json_extract(f.event_data,'age') as age,
    ->   json_extract(f.event_data,'gender') as gender,
    ->   sum(f.event_count) as event_count 
    -> from json_event_fact f
    -> group by state, age, gender;

+-------+----------+--------+-------------+
| state | age      | gender | event_count |
+-------+----------+--------+-------------+
| CA    | 18-24    | female |        5384 |
| CA    | 18-24    | male   |         791 |
| CA    | 18-24    | null   |          25 |
| CA    | 25-34    | female |        6731 |
| CA    | 25-34    | male   |        1292 |
| CA    | 25-34    | null   |          38 |
| CA    | 35-44    | female |       10638 |
| CA    | 35-44    | male   |        1822 |
| CA    | 35-44    | null   |          20 |
+-------+----------+--------+-------------+

Read on for details on how to install the MySQL JSON UDFs.

Installation

If you’ve never installed a MySQL plugin on your server the first thing you should do is figure out where your plugin directory is. You can get that info from a MySQL server variable like this:

1
2
3
4
5
6
7
mysql> show variables like 'plugin_dir';
+---------------+--------------------------+
| Variable_name | Value                    |
+---------------+--------------------------+
| plugin_dir    | /usr/lib64/mysql/plugin/ |
+---------------+--------------------------+
1 row in set (0.00 sec)

Now it’s time to download the MySQL JSON UDFs. The current release is 0.3.0. It’s best to visit MySQL Labs first so you can read all of the warnings, but you can also download the tar directly to the server using wget. Here are the commands I ran to download, extract, and copy the pre-built 64-bit Linux version of the library into the plugin directory:

1
2
3
# wget http://downloads.mysql.com/snapshots/pb/mysql-json-udfs-0.3.0/mysql-json-udfs-0.3.0-labs-json-udfs-linux-glibc2.5-x86_64.tar.gz
# tar xvf mysql-json-udfs-0.3.0-labs-json-udfs-linux-glibc2.5-x86_64.tar.gz
# cp -i mysql-json-udfs-0.3.0-labs-json-udfs-linux-glibc2.5-x86_64/libmy_json_udf.so /usr/lib64/mysql/plugin/.

Then create the function(s) you want to use at the MySQL prompt:

1
2
mysql> create function json_extract returns string soname 'libmy_json_udf.so';
Query OK, 0 rows affected (0.01 sec)

And test it out!

1
2
3
4
5
6
7
mysql> select json_extract('{"a":1,"b":2}','b');
+-----------------------------------+
| json_extract('{"a":1,"b":2}','b') |
+-----------------------------------+
| 2                                 |
+-----------------------------------+
1 row in set (0.00 sec)

I couldn’t find any online documentation for these UDFs, but there is a brief description of each function in the README file. For example, here’s what it says about json_extract():

json_extract(doc, keypart1, keypart2, …)

Extracts value of the specified key. Returns value of the key specified, NULL if parsing failed. Warning! This version does not check whole document for validity.

Even though I’m no longer using common_schema for this use case, I still find it tremendously useful for many other cases. In fact, when I was testing json_extract() I used common_schema.query_checksum() to verify that my new queries were giving me exactly the same output as the old queries.

Comments

Comments