Flite Culture

Using Explode and Lateral View in Hive

At Flite we store a lot of JSON formatted log data in Amazon S3, and we use Amazon Elastic MapReduce to run Hive queries on that log data using Amazon’s standard JSON SerDe. Sometimes we store embedded JSON objects, which can be extracted pretty easily using the get_json_object function.

We also store some embedded JSON objects, including arrays. Recently I wanted to do a rollup query in Hive using the exploded results of an embedded array using the explode UDTF along with the LATERAL VIEW syntax. Converting the JSON array (which is stored as a string) to a Hive array was a bit messy, but I got it to work. Below I will show the query and break down the steps to create the query.

The query

Here’s the final query to count impressions by placement, id, and version for a given partition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
select d.placement_id, d.id, d.version, count(1) as impressions
from
(
  select c.placement_id,
    get_json_object(c.info_object,'$.id') as id,
    get_json_object(c.info_object,'$.v') as version
  from
  (
    select b.placement_id, info_object
    from
    (
      select placement_id,
        split(regexp_replace(regexp_replace(a.info_array_string,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''),'\\;') as info_array
      from
      (
        select placement_id,
          get_json_object(embedded_data, '$.info') as info_array_string
        from ad_impression
        where dt = '2014-04-15-12'
      ) a
    ) b lateral view explode(b.info_array) info_array_exploded as info_object
  ) c
) d
group by d.placement_id, d.id, d.version;

Read on for a detailed explanation of how the query is built.

Read on ✈
Comments

Recovering MySQL Data From the BLACKHOLE Storage Engine

I used to think it was impossible to recover data from the MySQL BLACKHOLE storage engine. As stated in the manual:

The BLACKHOLE storage engine acts as a “black hole” that accepts data but throws it away and does not store it. Retrievals always return an empty result

I’ve always trusted the manual on this, but a recently published article about information and black holes proposes that the information may be recoverable. As stated in the abstract:

…probability conservation is guaranteed because black holes stimulate the emission of radiation in response to infalling matter. This stimulated radiation is non-thermal and contains all of the information about the infalling matter…

The nearest black hole is about 1600 light years away from earth, so I decided to apply this research closer to home by creating a new storage engine to capture the data that I insert into my BLACKHOLE table using stimulated radiation. I should be ready to open source this project soon, but for now here’s a quick demonstration.

First, here’s the classical example of data loss with the BLACKHOLE storage engine:

1
2
3
4
5
6
7
8
9
mysql> CREATE TABLE test(i INT, c CHAR(10)) ENGINE = BLACKHOLE;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO test VALUES(1,'record one'),(2,'record two');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM test;
Empty set (0.00 sec)

Now I will use my new storage engine (I named it “Adami” after the author of the research paper) to recover the data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> CREATE TABLE test_recovery as select * from test;
Query OK, 0 rows affected (0.18 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test_recovery ENGINE = Adami;
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM test_recovery;
+------+------------+
| i    | c          |
+------+------------+
|    1 | record one |
|    2 | record two |
+------+------------+
2 rows in set (0.00 sec)

I recovered the data! If you’ve ever lost important data in the BLACKHOLE storage engine, this could be the solution you’ve been looking for.

Comments

Flite Hackdays - Q1 2014 Edition

It was time again for Flite Hackday where we get together for 24hrs, form teams, and build cool stuff! The purpose is to get out of our day-to-day work mode and to try new things or learn something new. It’s a way for us to tap into our creativity and inspire new ideas. The results from this installment of Flite Hackday was pretty impressive and shows just how talented the team really is! The following is a summary of the different projects…

Read on ✈

Avoiding Data Loss With Older Versions of Pt-table-sync

The pt-table-sync tool uses the MySQL REPLACE command to rewrite row data on a master database in order to assure consistency between the master and its replicas. The REPLACE command makes a lot of sense for that tool, but it also has the potential to cause problems. I’ve written about the downsides of the REPLACE command in the past, and sure enough in older versions (I was using 2.1.9) of pt-table-sync it’s possible to accidentally delete child rows because of its use of REPLACE. The fix is to run it with --no-foreign-key-checks, which is the default as of version 2.2.6 according to this bug report.

I’ll take this as a reminder to upgrade Percona Toolkit from 2.1.9 to 2.2.7, and to always test tools like this in a non-production database in order to avoid potential data loss in production.

Here’s a quick example to illustrate the problem. First I create two test tables, and insert one parent row and one child row on my master database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql> create table if not exists parent (
    -> id int unsigned not null auto_increment primary key, 
    -> name varchar(30)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> create table if not exists child (
    ->   id int unsigned not null auto_increment primary key, 
    ->   parent_id int unsigned, 
    ->   name varchar(30), 
    ->   foreign key fk_child_to_parent (parent_id) references parent(id) on delete cascade
    -> ) engine = innodb;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into parent values (1,'inserted');
Query OK, 1 row affected (0.01 sec)

mysql> insert into child values (1,1,'inserted');
Query OK, 1 row affected (0.01 sec)

mysql> select * from child;
+----+-----------+----------+
| id | parent_id | name     |
+----+-----------+----------+
|  1 |         1 | inserted |
+----+-----------+----------+
1 row in set (0.00 sec)

Then I update the parent row on the replica to introduce a data inconsistency:

1
2
3
mysql> update parent set name = 'updated' where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now I run pt-table-sync on the master to fix the data inconsistency:

1
2
3
4
5
6
7
$ pt-table-sync --execute \
> --sync-to-master \
> --verbose \
> h=replica1.flite.com,P=3306,F=/etc/pt-table-checksum.cnf,D=test,t=parent
# Syncing D=test,F=/etc/pt-table-checksum.cnf,P=3306,h=replica1.flite.com,t=parent
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       1      0      0 Chunk     17:03:26 17:03:26 2    test.parent

But unfortunately the REPLACE command deletes the parent row before re-inserting it, which triggers the cascade delete from th foreign key and deletes the child row from both the master database and the replica:

1
2
mysql> select * from child;
Empty set (0.00 sec)

If I ran pt-table-sync like this (using --no-foreign-key-checks) it would not have deleted the child row:

1
2
3
4
5
6
7
8
$ pt-table-sync --execute \
> --sync-to-master \
> --verbose \
> --no-foreign-key-checks \
> h=replica1.flite.com,P=3306,F=/etc/pt-table-checksum.cnf,D=test,t=parent
# Syncing D=test,F=/etc/pt-table-checksum.cnf,P=3306,h=replica1.flite.com,t=parent
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       1      0      0 Chunk     17:07:50 17:07:50 2    test.parent
Comments

Cloning MySQL Users With Common_schema

Today I needed to clone a MySQL user on one of my databases. More precisely, I needed to clone a MySQL grantee in order to allow an existing user to access the database from a different IP address. The process was very easy with the duplicate_grantee() procedure in common_schema. (As an aside: When I first saw this procedure in the docs I thought it was meant to find grantees that are duplicates in order to consolidate them, but it’s actually meant to copy an existing grantee and its privileges. I guess the name reminded me of pt-duplicate-key-checker. In this case “duplicate” is a verb, not an adjective. Now that I know what it does I find it quite useful).

Here’s the command I ran:

1
2
mysql> call common_schema.duplicate_grantee('scott@1.2.3.4', 'scott@5.6.7.8');
Query OK, 0 rows affected, 6 warnings (0.27 sec)

Oops, what’s up with those warnings?

1
2
3
4
5
6
7
8
9
10
11
12
mysql> show warnings;
+---------+------+-----------------------------------+
| Level   | Code | Message                           |
+---------+------+-----------------------------------+
| Warning | 1260 | Row 44 was cut by GROUP_CONCAT()  |
| Warning | 1260 | Row 73 was cut by GROUP_CONCAT()  |
| Warning | 1260 | Row 88 was cut by GROUP_CONCAT()  |
| Warning | 1260 | Row 96 was cut by GROUP_CONCAT()  |
| Warning | 1260 | Row 189 was cut by GROUP_CONCAT() |
| Warning | 1260 | Row 197 was cut by GROUP_CONCAT() |
+---------+------+-----------------------------------+
6 rows in set (0.00 sec)

Ah yes, this DB is running with group_concat_max_len set to the default of 1024 (I’ve never understood why that default is so low). This was a good reminder to increase group_concat_max_len in /etc/my.cnf on this host, but for now I can just increase it dynamically and re-run the proc:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> select @@group_concat_max_len;
+------------------------+
| @@group_concat_max_len |
+------------------------+
|                   1024 |
+------------------------+
1 row in set (0.00 sec)

mysql> set global group_concat_max_len = 1048576;
Query OK, 0 rows affected (0.00 sec)

mysql> set session group_concat_max_len = 1048576;
Query OK, 0 rows affected (0.00 sec)

mysql> call common_schema.duplicate_grantee('scott@1.2.3.4', 'scott@5.6.7.8');
Query OK, 0 rows affected (0.12 sec)

Success!

Out of curiousity I did a quick search for other tools to do this. I found a somewhat similar procedure in securich called clone_user(), but it seems targetted towards creating an entirely new user, not just a new grantee. So it was not ideal for my use case today, but I may find it useful in the future.

From the docs:

  • clone_user(‘sourceusernanme’,’sourcehostname’,’destusername’,’desthostname’,’destemailaddress’);

    If you have a particular user in a team already set up and you just want to create other users likewise, why not just clone them? The new user will of course have a different password which is supplied to the creator upon creation.

From the tutorial:

1
2
3
4
5
6
7
8
9
mysql> call clone_user('paul' , '10.0.0.2' , 'judas' , '10.0.0.2' , 'judas@domain.com');
+----------------------------------------------------------------------------------------------+
| USER_PASSWORD                                                                                |
+----------------------------------------------------------------------------------------------+
| Password for user -- judas -- contactable at -- judas@domain.com -- is -- eb5186cfa4b1c21 -- | 
+----------------------------------------------------------------------------------------------+
1 row in set (0.05 sec)

Query OK, 1 row affected, 3 warnings (0.14 sec)

The mysqluserclone utility also provides similar functionality to clone_user() in securich.

Comments

The Difference Between UDFs and Stored Functions in MySQL

I’ve seen some confusion about the difference between user-defined functions (UDFs) and stored functions in MySQL. In my last post I talked about UDFs, so I figured now is a good time to write about the difference between the two types of functions you can create in MySQL. Both are created using CREATE FUNCTION, and they share the same namespace, but they are implemented completely differently.

Stored functions

Stored functions are simply stored routines that return a value. Stored functions are easy to create if you know SQL, and they can be invoked just like built-in functions, but they are much slower than built-in functions. To illustrate that point, look at how re-implementing the built-in CONCAT() function as a stored function makes it more than 100 times slower:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mysql> delimiter $$
mysql> drop function if exists my_concat $$
Query OK, 0 rows affected (0.01 sec)
mysql> create function my_concat (p_str1 longtext, p_str2 longtext) returns longtext
    -> begin 
    ->   return concat(p_str1,p_str2);
    -> end $$
Query OK, 0 rows affected (0.00 sec)
mysql> delimiter ;

mysql> select benchmark(10000000,concat('foo','bar'));
+-----------------------------------------+
| benchmark(10000000,concat('foo','bar')) |
+-----------------------------------------+
|                                       0 |
+-----------------------------------------+
1 row in set (0.44 sec)

mysql> select benchmark(10000000,my_concat('foo','bar'));
+--------------------------------------------+
| benchmark(10000000,my_concat('foo','bar')) |
+--------------------------------------------+
|                                          0 |
+--------------------------------------------+
1 row in set (50.30 sec)

Running the built-in CONCAT() 10 million times took less than half a second, and running the stored function version 10 million times took over 50 seconds.

UDFs

User-defined functions are written in C or C++. The process is well-documented in the manual. Since they are written in C/C++ they are not as easy to implement for many SQL developers, but they are much faster than stored functions. MySQL source distributions include a file called udf_example.c that defines some example UDFs.

Comments

Faster JSON Parsing Using MySQL JSON UDFs

A while back I blogged about JSON parsing in MySQL using Common_schema. Given the number of page views that post continues to get I think it’s worth posting an update about how I solve that problem today. I now use the json_extract() function from MySQL JSON UDFs, which is available at MySQL Labs. There are multiple warnings on the download page that these UDFs should not be used on a production site, so consider yourself warned. In my case I’m running these reports on a read-only reporting replica of my production DB, so if they crash my server it wouldn’t have direct customer impact. I can live with the risk for now.

Why I made the switch

I made the switch because the common_schema functions I was using were too slow. That fact is well known, it’s even mentioned explicitly in the documentation:

NOTE: this function is CPU intensive. This solution should ideally be implemented through built-in functions, not stored routines.

common_schema was fast enough for my original use case, which only required parsing a couple of values from a few hundred rows of very small JSON objects. Over time I had to parse more values from more rows of larger JSON objects, and some of my reports started taking many hours to run. That’s when I installed the JSON UDFs and did some testing. In some of my tests the json_extract() function is 100 times as fast as common_schema.get_option(), so I’ve started switching my larger reports to use the JSON UDFs. Currently there are 9 MySQL JSON UDFs, but for now I am just using the json_extract() function. I’m looking forward to trying the other functions in the future.

Here’s an example of one of the queries I run (This function supports complex JSON structures, but for now my JSON objects are flat so the use case is pretty simple):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mysql> select json_extract(f.event_data,'state') as state,
    ->   json_extract(f.event_data,'age') as age,
    ->   json_extract(f.event_data,'gender') as gender,
    ->   sum(f.event_count) as event_count 
    -> from json_event_fact f
    -> group by state, age, gender;

+-------+----------+--------+-------------+
| state | age      | gender | event_count |
+-------+----------+--------+-------------+
| CA    | 18-24    | female |        5384 |
| CA    | 18-24    | male   |         791 |
| CA    | 18-24    | null   |          25 |
| CA    | 25-34    | female |        6731 |
| CA    | 25-34    | male   |        1292 |
| CA    | 25-34    | null   |          38 |
| CA    | 35-44    | female |       10638 |
| CA    | 35-44    | male   |        1822 |
| CA    | 35-44    | null   |          20 |
+-------+----------+--------+-------------+

Read on for details on how to install the MySQL JSON UDFs.

Read on ✈
Comments

Fast-forwarding a Delayed MySQL Replica

Last week I needed to fast-forward a delayed MySQL replica in order to recover some data. I use pt-slave-delay to keep this replica X days behind the master, and I needed to fast-forward it to be about X-2 days behind the master instead.

I wanted to catch the replica up to a precise moment in time, and it was important that it not go past that point until I was done recovering the data I needed.

I knew there was a command to do this, but I couldn’t remember the syntax. Since I figured pt-slave-delay was implemented using this command, I just opened the perl script and searched around until I found the syntax I was looking for. Here it is:

1
2
START SLAVE [SQL_THREAD] UNTIL
    MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos

Once I got the right syntax I needed to know which log_name and log_pos values to use.

I got the log_name by SSHing to the master DB, listing the binary log directory, and finding the oldest log with an mtime greated than the time I was looking for (If there is a way to get a listing of binary logs based on time range in the MySQL CLI, please let me know in the comments!).

Once I had log_name, I used the mysqlbinlog tool to find the log_pos I was looking for. In my experience start-datetime is inclusive, and stop-datetime is exclusive, so to get all of the binary logging from 2014-02-04 16:18:30 I use that as start-datetime and add 1 second to stop-datetime. For example:

1
2
3
sudo mysqlbinlog bin-log.000386 \
  --start-datetime="2014-02-04 16:18:30" \
  --stop-datetime="2014-02-04 16:18:31" 

Then I sorted through the output of that until I found the most recent SQL statement I wanted to execute on my delayed replica. One of the comments preceding that statement has an end_log_pos value, which I used as my log_pos.

In order to fast-forward the replica I needed to stop running pt-slave-delay. Otherwise it would not let replication proceed past X days behind the master.

Once I had those values, here’s the process I followed to fast-forward the replica:

  • Disable monit for pt-slave-delay so it wouldn’t restart automatically
  • Stop pt-slave-delay
  • Run STOP SLAVE on the delayed replica
  • Run START SLAVE UNTIL MASTER_LOG_FILE = 'bin-log.000386', MASTER_LOG_POS = 950592556 on the delayed replica
  • Wait for replication to catch up to that point
  • Recover the data I need
  • Run STOP SLAVE on the delayed replica (just to be safe)
  • Enable monit for pt-slave-delay
  • Start pt-slave-delay (or just let monit start it)

At that point the replica will stay in that state for about 2 days, at which point pt-slave-delay will again enforce the pre-defined replication delay of X days.

Comments

Overriding MySQL Current_timestamp Default on Update

The MySQL TIMESTAMP data type supports automatic initialization and updating to the current date and time. This can be a convenient way to track when rows are created and modified, but it’s easy to forget about them and have them update data when you don’t want that. In those cases you need to explicitly set the column value to avoid the automatic behavior. On an update, that means setting the column to its current value. Quoting the manual:

If the column is auto-updated, it is automatically updated to the current timestamp when the value of any other column in the row is changed from its current value. The column remains unchanged if all other columns are set to their current values. To prevent the column from updating when other columns change, explicitly set it to its current value. To update the column even when other columns do not change, explicitly set it to the value it should have (for example, set it to CURRENT_TIMESTAMP).

For example, say you have a table with a column called mtime that you use to track the last time your application modified each row. If you do a bulk update to that table outside of the application, the mtime column will still be updated even if you don’t want that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
create table ad (
  id int unsigned not null auto_increment,
  name varchar(255) not null,
  friendly_name varchar(255) not null,
  ctime timestamp not null default current_timestamp,
  mtime timestamp not null default current_timestamp on update current_timestamp,
  primary key (id)
);

mysql> insert into ad (name,friendly_name) values ('hello world','hello world');
Query OK, 1 row affected (0.00 sec)

mysql> select * from ad;
+----+-------------+---------------+---------------------+---------------------+
| id | name        | friendly_name | ctime               | mtime               |
+----+-------------+---------------+---------------------+---------------------+
|  1 | hello world | hello world   | 2014-02-11 11:13:13 | 2014-02-11 11:13:13 |
+----+-------------+---------------+---------------------+---------------------+
1 row in set (0.00 sec)

mysql> do sleep(5);
Query OK, 0 rows affected (5.01 sec)

mysql> update ad set friendly_name = replace(name,' ','_');
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from ad;
+----+-------------+---------------+---------------------+---------------------+
| id | name        | friendly_name | ctime               | mtime               |
+----+-------------+---------------+---------------------+---------------------+
|  1 | hello world | hello_world   | 2014-02-11 11:13:13 | 2014-02-11 11:13:18 |
+----+-------------+---------------+---------------------+---------------------+
1 row in set (0.00 sec)

In the above example the value of mtime changed from 2014-02-11 11:13:13 to 2014-02-11 11:13:18 when I did not want it to change.

The solution is to explicitly set the mtime column to its existing value:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> insert into ad (name,friendly_name) values ('hello world','hello world');
Query OK, 1 row affected (0.00 sec)

mysql> select * from ad;
+----+-------------+---------------+---------------------+---------------------+
| id | name        | friendly_name | ctime               | mtime               |
+----+-------------+---------------+---------------------+---------------------+
|  1 | hello world | hello world   | 2014-02-11 11:18:03 | 2014-02-11 11:18:03 |
+----+-------------+---------------+---------------------+---------------------+
1 row in set (0.00 sec)

mysql> do sleep(5);
Query OK, 0 rows affected (5.00 sec)

mysql> update ad set friendly_name = replace(name,' ','_'), 
    ->   mtime = mtime;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from ad;
+----+-------------+---------------+---------------------+---------------------+
| id | name        | friendly_name | ctime               | mtime               |
+----+-------------+---------------+---------------------+---------------------+
|  1 | hello world | hello_world   | 2014-02-11 11:18:03 | 2014-02-11 11:18:03 |
+----+-------------+---------------+---------------------+---------------------+
1 row in set (0.00 sec)

In the above example the value of mtime did not change.

Comments

Speaking at Percona Live 2014

This April I am speaking at Percona Live: MySQL Conference and Expo 2014. I have always enjoyed this conference in the past, and am looking forward to participating as a speaker for the first time.

I am presenting two talks:

My first talk is for anyone who wants to make their life easier using MySQL tools. This would be a good companion talk to Daniel Nichter’s two talks on Percona Toolkit, and Shlomi Noach’s talk on common_schema. Or if you can’t make it to those talks, hopefully mine could be a useful fallback choice.

My second talk is geared towards people who are trying to decide whether to use performance_schema, especially in MySQL 5.6 and beyond where it is enabled by default. I’ll cover the overhead and benefits of performance_schema as I see them, as well as the fine-grained controls you can use to define what is instrumented and how much detail is captured.

Overall I see a lot of sessions that I would like to attend at this year’s conference. In fact there are several time slots where I already know I’m going to have a hard time picking which talk to attend. See you in Santa Clara in April!

Comments