Flite Culture

MySQL Binary Log Parsing With Pt-query-digest

I frequently use pt-query-digest to parse the MySQL slow query log. This morning I wanted to parse the MySQL binary log to find out how frequently various tables in my database are updated. A quick check of the pt-query-digest documentation confirmed that pt-query-digest does support parsing the binary log:

–type

type: Array; default: slowlog

The type of input to parse. The permitted types are

binlog

   Parse a binary log file.

However, the docs don’t have any further details or examples. Given the lack of documentation my first instinct was to run pt-query-digest on my binary log using defaults for all of the other settings, like this:

1
pt-query-digest --type binlog /var/log/mysql-bin/bin-log.000625

That won’t work. Why not? Apparently when the documentaiton says “Parse a binary log file” it doesn’t mean a MySQL binary log in binary format. It means a binary log in text format. So first I need to convert my binary log to text using mysqlbinlog like this:

1
mysqlbinlog /var/log/mysql-bin/bin-log.000625 > /tmp/bin-log.000625.sql

Then I can parse it using pt-query-digest. The defaults work pretty well:

1
2
pt-query-digest --type binlog \
  /tmp/bin-log.000625.sql

That gives me the top 95% of queries based on total response time, grouped by their fingerprint, over the entire file. So that gives me a pretty good idea of which tables are being updated the most frequently.

I can also look at a specific time window. Say I only want to look at one hour of data:

1
2
3
4
pt-query-digest --type binlog \
  --since "2013-11-04 12:00:00" \
  --until "2013-11-04 13:00:00" \
  /tmp/bin-log.000625.sql

In my specific case I want to see all of the queries, not just the top 95%. I want to sort them by occurences instead of response time. I only care about the profile, not the other sections. I also like to explicitly name some of the defaults for the sake of readability. This is the final command I came up with:

1
2
3
4
5
6
7
pt-query-digest --type binlog \
  --group-by fingerprint \
  --limit "100%" \
  --order-by "Query_time:cnt" \
  --output report \
  --report-format profile \
  /tmp/bin-log.000012.sql

I hope you find these examples useful. There are lots of other options in the docs, so you should experiment to see what works best for your use case.


Comments