Flite Culture

Why I Use ONLY_FULL_GROUP_BY

MySQL uses the concept of SQL_MODE to “define what SQL syntax MySQL should support and what kind of data validation checks it should perform”. This post is about one of those modes, ONLY_FULL_GROUP_BY, and why I use it.

Roland Bouman wrote a great post a few years ago that debunks some myths about using GROUP BY in MySQL. His post has a lot of detail and examples, and does a very good job detailing the way GROUP BY works in MySQL with and without ONLY_FULL_GROUP_BY enabled. I recommend that you go and read that post now. Among other things, Roland points out one case where query performance is improved by not using a full GROUP BY. The post is several years old, but the performance difference is still present today in MySQL 5.6.

That post doesn’t make any strong recommendation on using ONLY_FULL_GROUP_BY, so why do I use it? For me, it makes it safer to run dynamically generated report queries. I execute a variety of dynamic reports that select one or more metrics for a specific time period over a specific set of dimensions. In order for the report to return accurate data, all of the dimensional columns must be in both the SELECT clause and the GROUP BY clause. I trust myself to write good SQL by hand, but since I have code generating dynamic SQL I value the extra protection provided by this SQL mode. If MySQL throws an error due to a partial GROUP BY I can catch it with a functional test, rather than trying to catch a more subtle error, namely incorrect report data.

To illustrate the problem I am trying to avoid, I will execute a reporting query on the sakila database. Let’s say I want to see the 10 most popular language/category combinations for the films in the sakila database. I could use a query like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> select l.name,c.name,count(*)
    -> from sakila.film f
    -> left outer join sakila.film_category fc on fc.film_id = f.film_id
    -> left outer join sakila.category c on c.category_id = fc.category_id
    -> left outer join sakila.language l on l.language_id = f.language_id
    -> group by c.name
    -> order by count(*) desc
    -> limit 10;
+---------+-------------+----------+
| name    | name        | count(*) |
+---------+-------------+----------+
| English | Sports      |       74 |
| English | Foreign     |       73 |
| English | Family      |       69 |
| English | Documentary |       68 |
| English | Animation   |       66 |
| English | Action      |       64 |
| English | New         |       63 |
| English | Drama       |       62 |
| English | Games       |       61 |
| English | Sci-Fi      |       61 |
+---------+-------------+----------+
10 rows in set (0.00 sec)

That looks pretty good. Now imagine my query was written by a query builder in code, and due to a bug the category column was not added to the GROUP BY clause. By default MySQL will still execute the query and give me a result, but the result is misleading:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> select l.name,c.name,count(*)
    -> from sakila.film f
    -> left outer join sakila.film_category fc on fc.film_id = f.film_id
    -> left outer join sakila.category c on c.category_id = fc.category_id
    -> left outer join sakila.language l on l.language_id = f.language_id
    -> group by l.name
    -> order by count(*) desc
    -> limit 10;
+---------+-------------+----------+
| name    | name        | count(*) |
+---------+-------------+----------+
| English | Documentary |     1000 |
+---------+-------------+----------+
1 row in set (0.01 sec)

That makes it look like every film in the sakila database is a Documentary, which is obviously not accurate.

To avoid this I will set the SQL_MODE. When setting the SQL_MODE you need to be careful not to remove any existing SQL_MODE values. In my case I will append ONLY_FULL_GROUP_BY to the 2 SQL_MODE values I am already using:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> select @@sql_mode;
+--------------------------------------------+
| @@sql_mode                                 |
+--------------------------------------------+
| STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
+--------------------------------------------+
1 row in set (0.00 sec)

mysql> set session sql_mode = 'STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)

mysql> select @@sql_mode;
+---------------------------------------------------------------+
| @@sql_mode                                                    |
+---------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
+---------------------------------------------------------------+
1 row in set (0.00 sec)

Now that I’ve set the SQL_MODE, I will try my query again:

1
2
3
4
5
6
7
8
9
10
mysql> select l.name,c.name,count(*)
    -> from sakila.film f
    -> left outer join sakila.film_category fc on fc.film_id = f.film_id
    -> left outer join sakila.category c on c.category_id = fc.category_id
    -> left outer join sakila.language l on l.language_id = f.language_id
    -> group by l.name
    -> order by count(*) desc
    -> limit 10;
ERROR 1055 (42000): 'sakila.c.name' isn't in GROUP BY
mysql> 

That’s much better. In my case I would rather get an error than get bad data, so this is the result I want.

Comments

Comments