Bash Script to split a single CSV file into multiple files with headers

A Bash Script implementing the UNIX head, tail, split and cat commands can be used to split a large Comma Separated Values (CSV) or other text file into smaller chunks containing the original file’s header.

In this example, I have a ‘data.csv‘ file with the following content:

name,value
item1,20
item2,23
item3,22
item4,12
item5,65
item6,31
item7,43
item8,12
item9,43
item10,12
item11,11
item12,33
item13,33
item14,22
item15,75

Create a new script file (in this example I’ve named it ‘splitcsv.sh) containing the code below in the above mentioned file’s directory using a plain text editor such as nano, vim or TextEdit:

#!/bin/bash
 
# check if an input filename was passed as a command
# line argument:
if [ ! $# == 1 ]; then
  echo "Please specify the name of a file to split!"
  exit
fi
 
# create a directory to store the output:
mkdir output
 
# create a temporary file containing the header without
# the content:
head -n 1 $1 > header.csv
 
# create a temporary file containing the content without
# the header:
tail +2 $1 > content.csv
 
# split the content file into multiple files of 5 lines each:
split -l 5 content.csv output/data_
 
# loop through the new split files, adding the header
# and a '.csv' extension:
for f in output/*; do cat header.csv $f > $f.csv; rm $f; done;
 
# remove the temporary files:
rm header.csv
rm content.csv

Browse to the directory containing the script with a Terminal window and enter (replacing ‘scriptfile‘ with the name you gave the script file and ‘datafile‘ with your CSV file’s name):

sh ./{scriptfile} {datafile}

The split command can also break apart files based on a number of bytes or a specific pattern. For more information, see its man page:

man split

 

Related posts:

  1. Bash script to generate MD5 hashes
  2. Sending Tweets from the command line using a Bash script
  3. Mac OS X Quick Tip: Launch the screen saver with a bash script using AppleScript
  4. Enabling Terminal’s directory and file color highlighting in Mac OS X
  5. Displaying a text-based file and folder tree on the command line
Twitter Digg Delicious Stumbleupon Technorati Facebook Email

11 Responses to “Bash Script to split a single CSV file into multiple files with headers”

  1. That’s very nice. I was using a separate script to add the header on, like this:

    #!/bin/bash
     
    for var in "$@"
    do
        cat ../column_headers.csv "$var" > "$var_done.csv"
    done

    The split command doesn’t work for me, though, as my records have new lines in some of the fields: the splits often come in the middle of a record. Is there an easy way around that?

    Thanks!
    J.

  2. When you export the initial CSV file, you could see if the exporter lets you specify a row delimiter (“~” in the example below – a tilde sign) as well as a field delimiter (which is usually “,” – a comma).

    If your data file looked like this:

    name,value~
    item1,20~
    item2,23 21
    22~
    item3,22~
    item4,12
    2~
    item5,65~
    item6,31~
    item7,43~
    item8,12~
    item9,43~
    item10,12~
    item11,11~
    item12,33~
    item13,33~
    item14,22~
    item15,75~

    You could run a command like this on it (in the example below I’m assuming your data file is named “data.txt”) to replace all non-row linebreaks with another character (in this case, “^”), :

    tr "\n" "^" < data.txt | sed "s/~^/~/g" | tr "~" "\n" > data2.txt

    Explanation of the command:
    - Use the “tr” command to read the contents of data.txt and replace all linebreaks with a “^” character
    -Pass the results to the “sed” command and replace all “~^” pairs with “~”
    -Pass the results to the “tr” command and replace all “~” with a linebreak
    -Store the final results to the “data2.txt” file

    After running the split script from this page’s Post on the second data file:

    sh ./script.sh data2.txt

    … go into the “output” directory and run this command to replace all “^” markers in the output files with linebreaks again:

    for f in *.csv; do tr "^" "\n" < $f > converted-$f; done
  3. There’s a bug in your tail line.

    it should be

    tail -n +1 $1 > content.csv

  4. Hey Joseph

    With the tail command, “tail -n +1″ means “read all lines from the first onwards”, whereas “tail -n +2″ means “read all lines from the second onwards”, so the second command is the one that you’d need to use for this script.

    If you used “tail -n +1″, your output would contain a double header line in the first file. :)

  5. Even though it works on my Mac OS X machines as detailed in this post, a reader noted that on his Ubuntu box, the

    tail +2 $1 > content.csv

    command didn’t work unless he added a “-n” in front of the “+2″:

    tail -n +2 $1 > content.csv
  6. Thank you very much. I was about to write this from scratch. It saved me a lot of time.

    Thank you.

  7. Sure! Let me know if you have any questions about it. :)

  8. Thanks (Obrigado!), this script work help me!
    From Brazil !

  9. You saved me so much time with this script and inspired me to start learning more about Bash! Thanks so much!

  10. I was trying to write it from scratch… it saved me a lot of time :-) )
    Much appreciated!!!

    alberto

Trackbacks/Pingbacks

  1. i.nt.ro » Splitting a csv file using bash in linux - 03 Jul 2009

    [...] This is a typo update on a post i’ve seen on this blog post http://www.geekology.co.za/blog/2009/02/bash-script-to-split-single-csv-file-into-multiple-files-wit... [...]

Afrigator