Recently I found some
NULL values in a Hive table in a numeric column that I expected to be
NOT NULL. The query that populates the table extracts a numeric value from a json-formatted string using the
get_json_object() UDF like this:
The intent of the
COALESCE() call is to replace all
NULL values of
sub_object.key with 0, but when I looked at the actual values I saw both 0 and
NULL present in the table. Why are some of the
NULL values not being replaced with 0? Because there are some non-numeric values in
sub_object.key, and those ended up being inserted as
NULL due to implicit type conversion.
get_json_object() returns a
string, which is then being implicitly converted to an
int, so a value like
123abc would not be overidden by
COALESCE(). Instead the implicit type conversion tries to cast
123abc as an
int and it ends up as
The solution? Use explicit type conversion instead of relying on implicit type conversion.
Specifically I updated my query to cast the
string as an
int, like this:
With that change in place my table no longer has any NULL values in that column, and all missing values are being treated equally and stored as 0.