1
votes

I've got three .csv files:

File1.csv

Header1
data1;data2;data3

File2.csv

Header2
data1;data2;data3
data1;data2;data3

File3.csv

Header3
data1;data2;data3

I want to join them like this:

Header1;     ;     ;Header2;     ;     ;Header3;     ;
data1  ;data2;data3;data1  ;data2;data3;data1  ;data2;data3
       ;     ;     ;data1  ;data2;data3;       ;     ;

I've been trying to join the files with paste, however the columns keep mixing up. I think this happens because sometimes one of the files has no data (only the header) or File2.csv has more data than File1.csv. In this case it should look like this:

Header1;     ;     ;Header2;     ;     ;Header3;     ;
       ;     ;     ;data1  ;data2;data3;data1  ;data2;data3
       ;     ;     ;data1  ;data2;data3;       ;     ;

Is paste the best way to accomplish this?

ps. I can change the way the .csv files are created. I'm now using semicolons as delimiter but this can be changed if it makes it easier.

1
I'd probably write something Perl/Python/Ruby. I think a simple command-line utility such as paste (or join?) won't handle your specific scenarioBrian Agnew
Are the number of columns fixed?, If they are not fixed, an empty file might always result in only one column. In the example given, you have three columns for an empty file (file only with header)Naks
Yes, number of columns are fixed. I can change the header to be followed by two empty cells if that would help (eg. header;;).Matter

1 Answers

1
votes

Python solution:

merge_csv.py script:

import sys
with open(sys.argv[1], 'r') as f1, open(sys.argv[2], 'r') as f2, open(sys.argv[3], 'r') as f3:

    f1_lines, f2_lines, f3_lines = f1.read().splitlines(), f2.read().splitlines(), f3.read().splitlines()
    max_lines = max(len(i) for i in (f1_lines, f2_lines, f3_lines))
    fmt = '{:7s};' * 8 + '{:7s}'

    for i in range(max_lines):
        if i == 0:
            print(fmt.format(f1_lines[i],'','',f2_lines[i],'','',f3_lines[i],'',''))
        else:
            f1_args = f1_lines[i].split(';') if i < len(f1_lines) else ['', '', '']
            f2_args = f2_lines[i].split(';') if i < len(f2_lines) else ['', '', '']
            f3_args = f3_lines[i].split(';') if i < len(f3_lines) else ['', '', '']
            print(fmt.format(*(f1_args + f2_args + f3_args)))

Usage:

python merge_csv.py File1.csv File2.csv File3.csv

The output:

Header1;       ;       ;Header2;       ;       ;Header3;       ;       
data1  ;data2  ;data3  ;data1  ;data2  ;data3  ;data1  ;data2  ;data3  
       ;       ;       ;data1  ;data2  ;data3  ;       ;       ;