Flite Culture

Capturing Errors and Warnings From LOAD DATA INFILE

MySQL makes it easy to bulk load multiple rows of data from a flat file into a MySQL table using the LOAD DATA INFILE command, but that command can quickly get you into trouble if you are not careful about capturing the warnings and errors it produces.

Running LOAD DATA INFILE commands at the mysql prompt gives you pretty good output, but if you run the same command at the terminal or in a shell script you have to do a little bit of extra work to capture the errors and warnings.

Here are a few techniques I use when I run LOAD DATA INFILE at the terminal or in a shell script:

  1. Use double-verbose mode (--v) to capture the high level counts of Records, Deleted, Skipped, Warnings, etc.
  2. Use --init-command to set session variables (sql_mode, sql_log_bin, foreign_key_checks, etc)
  3. Use --show warnings to get all of the warnings, even if they exceed the value of max_error_count
  4. Use user-defined variables and SET statements to explicitly handle NULL values and defaults.

Below I will provide more detail on each of those techniques, using the sakila database for my code examples:

1. Use double-verbose mode (--v) to capture the high level counts of Records, Deleted, Skipped, Warnings, etc.

For my first example, I’ll dump all of the rows from a table with a primary key into a flat file on disk, and then try to load that flat file into the table again and see what happens.

If I do a straightforward LOAD DATA INFILE I get an error:

1
2
3
4
5
mysql> select * into outfile '/tmp/rental.txt' from sakila.rental;
Query OK, 16044 rows affected (0.02 sec)

mysql> load data infile '/tmp/rental.txt' into table sakila.rental;
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

If I add the IGNORE keyword I can avoid that error, and MySQL tells me that all 16044 rows were skipped as duplicates:

1
2
3
mysql> load data infile '/tmp/rental.txt' ignore into table sakila.rental;
Query OK, 0 rows affected (0.23 sec)
Records: 16044  Deleted: 0  Skipped: 16044  Warnings: 0

The IGNORE keyword is also useful if your file contains some duplicates and some new rows. To illustrate this I delete 1000 random rows from the table and then re-run the same command:

1
2
3
4
5
6
mysql> delete from sakila.rental where rental_id >= 8000 and rental_id < 9000;
Query OK, 1000 rows affected (0.11 sec)

mysql> load data infile '/tmp/rental.txt' ignore into table sakila.rental;
Query OK, 1000 rows affected (0.74 sec)
Records: 16044  Deleted: 0  Skipped: 15044  Warnings: 0

The 1000 rows I deleted were successfully loaded back into the table, and the other 15044 rows were rejected as duplicates.

By default, running LOAD DATA INFILE from a shell script will not give you any output unless there is an error.

To illustrate what happens in a shell script I’ll run the same commands from the terminal.

Error conditions are the same:

1
2
3
4
$ mysql --database=sakila \
>   -e "load data infile '/tmp/rental.txt' into table sakila.rental;"
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 'PRIMARY'
$ 

If I add the IGNORE keyword, by default I don’t get the information about how many rows were skipped:

1
2
3
$ mysql --database=sakila -e \
>   "load data infile '/tmp/rental.txt' ignore into table sakila.rental;"
$ 

I can capture that information by passing in the --verbose flag, but I need to pass it twice, and I get some extra information, specifically the SQL command I execute is echoed back to me:

1
2
3
4
5
6
7
8
9
10
11
$ mysql --database=sakila \
>   -vve "load data infile '/tmp/rental.txt' ignore into table sakila.rental;"
--------------
load data infile '/tmp/rental.txt' ignore into table sakila.rental
--------------

Query OK, 0 rows affected (0.43 sec)
Records: 16044  Deleted: 0  Skipped: 16044  Warnings: 0

Bye
$ 

2. Use –init-command to set session variables (sql_mode, sql_log_bin, foreign_key_checks, etc)

--init-command is a handy way to set your session up the way you want it when running mysql commands from the terminal or in a shell script. For example, if your server uses a relaxed global sql_mode, you may enforce a stricter sql_mode for your session like this:

1
mysql --init-command="set sql_mode = 'STRICT_ALL_TABLES';"

You can set multiple variables in a single init-command. For example you can set the sql_mode, disable binary logging for your session, and disable foreign key checks like this:

1
mysql --init-command="set sql_mode = 'STRICT_ALL_TABLES'; set sql_log_bin = 0; set foreign_key_checks = 0;"

To further illustrate this point, I’ll use the sakila.rental table again in several different error scenarios that benefit from the use of --init-command.

First I’ll truncate the table before I load the data back in:

1
2
3
$ mysql --database=sakila \
>   -e "truncate table sakila.rental;"
ERROR 1701 (42000) at line 1: Cannot truncate a table referenced in a foreign key constraint (`sakila`.`payment`, CONSTRAINT `fk_payment_rental` FOREIGN KEY (`rental_id`) REFERENCES `sakila`.`rental` (`rental_id`))

Oops, I can’t truncate the rental table because the payment table has a foreign key to it. Since I’m planning to load the data back into the rental table I want to just ignore the foreign key. I can do that with --init-command:

1
2
3
4
$ mysql --database=sakila \
>   --init-command="set foreign_key_checks=0;" \
>   -e "truncate table sakila.rental;"
$ 

That’s better. Now I’ll change return_date from NULL to NOT NULL and see what happens when I try to load the flat file back in, knowing that it has 183 rows with NULL values in that column:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ mysql --database=sakila \
>   -e "alter table sakila.rental modify column return_date datetime not null;"
$ mysql --database=sakila \
>   -vve "load data infile '/tmp/rental.txt' into table sakila.rental;"
--------------
load data infile '/tmp/rental.txt' into table sakila.rental
--------------

Query OK, 16044 rows affected, 183 warnings (0.82 sec)
Records: 16044  Deleted: 0  Skipped: 0  Warnings: 183

Bye
$ 

All of the rows loaded successfully, but I got 183 warnings. Unfortunately I don’t see any of the warnings, which brings me to:

3. Use --show warnings to get all of the warnings, even if they exceed the value of max_error_count

I’ll try that truncate and load again using --show-warnings so I can see all of the warnings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ mysql --database=sakila \
>   --init-command="set foreign_key_checks=0;" \
>   -e "truncate table sakila.rental;"
$ mysql --database=sakila \
>   --show-warnings \
>   -vve "load data infile '/tmp/rental.txt' into table sakila.rental;"
--------------
load data infile '/tmp/rental.txt' into table sakila.rental
--------------

Query OK, 16044 rows affected, 183 warnings (0.82 sec)
Records: 16044  Deleted: 0  Skipped: 0  Warnings: 183

Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11492
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11537
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11559
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11573
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11589
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11607
Warning (Code 1263): Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11642
...
Bye
$ 

I snipped the list of warnings for brevity. It was 183 occurrences of the same warning.

This begs the question, what default value did MySQL use?

1
2
3
4
5
6
$ mysql --database=sakila -e "select min(return_date) from sakila.rental;"
+---------------------+
| min(return_date)    |
+---------------------+
| 0000-00-00 00:00:00 |
+---------------------+

That’s not ideal. I’ll try again with a stricter sql_mode that disallows zero dates:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ mysql --database=sakila \
>   --init-command="set foreign_key_checks=0;" \
>   -e "truncate table sakila.rental;"
$ mysql --database=sakila \
>   --show-warnings \
>   --init-command="set sql_mode = 'STRICT_ALL_TABLES';" \
>   -vve "load data infile '/tmp/rental.txt' into table sakila.rental;"
--------------
load data infile '/tmp/rental.txt' into table sakila.rental
--------------

ERROR 1263 (22004) at line 1: Column set to default value; NULL supplied to NOT NULL column 'return_date' at row 11492
Bye

Okay, that’s better. I don’t get any zero dates, but instead none of the data is loaded. Which brings me to:

4. Use user-defined variables and SET statements to explicitly handle NULL values and defaults

Let’s say I don’t want zero dates, but I do want to enforce a default value for rows without a date in the specific context of my LOAD DATA INFILE statement. For the purposes of this example I’ll use the current date and time as returned by the MySQL NOW() function.

I can capture the relevant column values from the flat file in a user-defined variable, and then use a SET statement within LOAD DATA INFILE to explictly set return_date=now() if a NULL is in the file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ mysql --database=sakila \
>   --show-warnings \
>   --init-command="set sql_mode = 'STRICT_ALL_TABLES';" \
>   -vve "load data infile '/tmp/rental.txt' into table sakila.rental \
>   (rental_id,rental_date,inventory_id,customer_id,@return_date,staff_id,last_update) \
>   set return_date = coalesce(@return_date,now());"
--------------
load data infile '/tmp/rental.txt' into table sakila.rental   (rental_id,rental_date,inventory_id,customer_id,@return_date,staff_id,last_update)   set return_date = coalesce(@return_date,now())
--------------

Query OK, 16044 rows affected (0.99 sec)
Records: 16044  Deleted: 0  Skipped: 0  Warnings: 0

Bye
$ 
Comments

Comments