Shell script: copying columns by header in a csv file to another csv file

Question

I have a csv file which I'll be using as input with a format looking like this:

xValue,value1-avg,value1-median,value2-avg,value3-avg,value3-median
1,3,4,20,14,20

The key attributes of the input file are that each "value" will have a variable number of statistics, but the statistic type and "value" will always be separated by a "-". I then want to output the statistics of all the "values" to separate csv files.

The output would then look something like this:

value1.csv

xvalue,value1-avg,value1-median
1,3,4

value2.csv

xvalue,value2-avg
1,20

I've tried finding solutions to this, but all I can find are ways to copy by the column number, not the header name. I need to be able to use the header names to append the associated statistics to each of the output csv files.

Any help is greatly appreciated!

P.S. the output file may have already been written to during previous runs of this script, meaning the code should append to the output file

Ed Morton Ed Morton · Accepted Answer · 2013-09-04T15:22:18

Untested but should be close:

awk -F, '
NR==1 {
    for (i=2;i<=NF;i++) {
        outfile = $i
        sub(/-.*/,".csv",outfile)
        outfiles[i] = outfile
    }
}
{
    delete(outstr)
    for (i=2;i<=NF;i++) {
        outfile = outfiles[i]
        outstr[outfile] = outstr[outfile] FS $i
    }
    for (outfile in outstr)
        print $1 outstr[outfile] >> outfile
}
' inFile.csv

Note that deleting a whole array with delete(outstr) is gawk-specific. With other awks you can use split("",outstr) to get the same effect.

Note that this appends the output you wanted to existing files BUT that means you'll get the header line repeated on every execution. If that's an issue, tell us how to know when to generate the header line or not but the solution I THINK you'll want would look something like this:

awk -F, '
NR==1 {
    for (i=2;i<=NF;i++) {
        outfile = $i
        sub(/-.*/,".csv",outfile)
        outfiles[i] = outfile
    }
    for (outfile in outfiles) {
        exists[outfile] = ( ((getline tmp < outfile) > 0) && (tmp != "") )
        close(outfile)
    }
}
{
    delete(outstr)
    for (i=2;i<=NF;i++) {
        outfile = outfiles[i]
        outstr[outfile] = outstr[outfile] FS $i
    }
    for (outfile in outstr)
        if ( (NR > 1) || !exists[outfile] )
            print $1 outstr[outfile] >> outfile
}
' inFile.csv

Shell script: copying columns by header in a csv file to another csv file

3 Answers