Bash Script to split a single CSV file into multiple files with headers
A Bash Script implementing the UNIX head, tail, split and cat commands can be used to split a large Comma Separated Values (CSV) or other text file into smaller chunks containing the original file’s header.
In this example, I have a ‘data.csv‘ file with the following content:
name,value item1,20 item2,23 item3,22 item4,12 item5,65 item6,31 item7,43 item8,12 item9,43 item10,12 item11,11 item12,33 item13,33 item14,22 item15,75
Create a new script file (in this example I’ve named it ‘splitcsv.sh‘) containing the code below in the above mentioned file’s directory using a plain text editor such as nano, vim or TextEdit:
#!/bin/bash # check if an input filename was passed as a command # line argument: if [ ! $# == 1 ]; then echo "Please specify the name of a file to split!" exit fi # create a directory to store the output: mkdir output # create a temporary file containing the header without # the content: head -n 1 $1 > header.csv # create a temporary file containing the content without # the header: tail +2 $1 > content.csv # split the content file into multiple files of 5 lines each: split -l 5 content.csv output/data_ # loop through the new split files, adding the header # and a '.csv' extension: for f in output/*; do cat header.csv $f > $f.csv; rm $f; done; # remove the temporary files: rm header.csv rm content.csv
Browse to the directory containing the script with a Terminal window and enter (replacing ‘scriptfile‘ with the name you gave the script file and ‘datafile‘ with your CSV file’s name):
sh ./{scriptfile} {datafile}
The split command can also break apart files based on a number of bytes or a specific pattern. For more information, see its man page:
man split
Related posts:
- Bash script to generate MD5 hashes
- Sending Tweets from the command line using a Bash script
- Mac OS X Quick Tip: Launch the screen saver with a bash script using AppleScript
- Enabling Terminal’s directory and file color highlighting in Mac OS X
- Displaying a text-based file and folder tree on the command line



27 Feb 2009 








author
That’s very nice. I was using a separate script to add the header on, like this:
The split command doesn’t work for me, though, as my records have new lines in some of the fields: the splits often come in the middle of a record. Is there an easy way around that?
Thanks!
J.
When you export the initial CSV file, you could see if the exporter lets you specify a row delimiter (“~” in the example below – a tilde sign) as well as a field delimiter (which is usually “,” – a comma).
If your data file looked like this:
You could run a command like this on it (in the example below I’m assuming your data file is named “data.txt”) to replace all non-row linebreaks with another character (in this case, “^”), :
Explanation of the command:
- Use the “tr” command to read the contents of data.txt and replace all linebreaks with a “^” character
-Pass the results to the “sed” command and replace all “~^” pairs with “~”
-Pass the results to the “tr” command and replace all “~” with a linebreak
-Store the final results to the “data2.txt” file
After running the split script from this page’s Post on the second data file:
… go into the “output” directory and run this command to replace all “^” markers in the output files with linebreaks again:
There’s a bug in your tail line.
it should be
tail -n +1 $1 > content.csv
Hey Joseph
With the tail command, “tail -n +1″ means “read all lines from the first onwards”, whereas “tail -n +2″ means “read all lines from the second onwards”, so the second command is the one that you’d need to use for this script.
If you used “tail -n +1″, your output would contain a double header line in the first file.
Even though it works on my Mac OS X machines as detailed in this post, a reader noted that on his Ubuntu box, the
command didn’t work unless he added a “-n” in front of the “+2″:
Thank you very much. I was about to write this from scratch. It saved me a lot of time.
Thank you.
Sure! Let me know if you have any questions about it.
Thanks (Obrigado!), this script work help me!
From Brazil !
You saved me so much time with this script and inspired me to start learning more about Bash! Thanks so much!
I was trying to write it from scratch… it saved me a lot of time
)
Much appreciated!!!
alberto