Flite Careers

Loading Large Flat Files Into MySQL With LOAD DATA INFILE and Pt-fifo-split

It’s easy to load multiple rows of data from a file into a MySQL table with a single LOAD DATA INFILE command, but if the file is over a certain size you may want to load it in multiple chunks. When I have a file with millions of lines I typically use pt-fifo-split to separate the file into multiple chunks in order to limit the size of each transaction. Unlike most tools in Percona Toolkit, pt-fifo-split doesn’t actually interact with your database, but this is one MySQL use case where it’s very helpful.

Here’s a quick example in bash that illustrates how I can take a large tab-delimited file, break it into multiple 100,000 line chunks, and load the chunks into a table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FLAT_FILE="/tmp/big_file.txt"
FIFO_PATH="${FLAT_FILE}.fifo"
LOAD_FILE="${FLAT_FILE}.load"
CHUNK_SIZE=100000

# Split the file
pt-fifo-split --force --lines ${CHUNK_SIZE} ${FLAT_FILE} --fifo ${FIFO_PATH} &
# Sleep 10 seconds to assure ${FIFO_PATH} exists before entering loop
sleep 10
while [ -e ${FIFO_PATH} ]
do
  # Write chunk to disk
  cat ${FIFO_PATH} > ${LOAD_FILE}
  # Load chunk into table
  mysql --database=test \
   --show-warnings \
   -vve "load data infile '${LOAD_FILE}' into table my_table;"
done

YMMV, so you should do some testing to determine the optimal chunk size based on your schema, storage engine, replication setup, complexity of LOAD DATA INFILE statement, etc.


Comments