Flite Culture

Counting Tokens With Common_schema

There are several approaches to count the occurrences of a substring inside a larger string in MySQL. Some people use replace() and length() (I would use char_length() instead) to do it. Others use stored functions. Recently I had to count the occurrences of a certain key across many rows of JSON. Instead of writing my own query using one of the approaches mentioned above I decided to use common_schema.

In the past I’ve discussed JSON parsing in MySQL using common_schema. In this case I didn’t need to parse the JSON, just to count keys, so I used the get_num_tokens() function instead. That function is designed to count the number of tokens in delimited text. It does so essentially by counting the number of delimiters and then adding one. Since I’m looking for a certain key and treating that key as the delimiter, I just need to subtract one from the result of get_num_tokens() to get the key/delimtier count instead of the token count.

Here’s an example:

mysql> set @haystack := '{"needles":["needle":{"type":"sewing","is_sharp":"yes"},"needle":{"type":"knitting","is_sharp":"no"},"needle":{"type":"acupunture","is_sharp":"yes"}]}';
Query OK, 0 rows affected (0.00 sec)

mysql> select @haystack, common_schema.get_num_tokens(@haystack, '"needle"')-1 as needles\G
*************************** 1. row ***************************
@haystack: {"needles":["needle":{"type":"sewing","is_sharp":"yes"},"needle":{"type":"knitting","is_sharp":"no"},"needle":{"type":"acupunture","is_sharp":"yes"}]}
  needles: 3