1
votes

I am trying to merge the contents of multiple files based on a key matching with awk, I have seen solutions only for two input files, but not more. The input files look like this:

file1

1#a1
2#b1
3#c1
4#d1
6#f1

file2

1#a2
2#b2
3#c2
5#e2
6#f2

file3

1#a3#extra_field_1
2#b3#extra_field_2
3#c3#extra_field_3
4#d3#extra_field_4
5#e3#extra_field_5

The desired output is the following:

output

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3
d1;;d3;extra_field_4
;e2;3e;extra_field_5

For this, I am using a bash script based on awk command like the following:

$ awk -v OFS=';' -F '#' 'FNR==NR{a[$1]=$2;next} FNR!=NR{b[$1]=$2;next} NF==3{print a[$1],b[$1],$2,$3}' file1 file2 file3 > output

Anyway, it seems to obviate some of the inputs because it doesn't produce any output, any ideas?

Thanks.

3

3 Answers

2
votes

You could do that using just the join command

join -t\# file1 file2 -j 1 |\
    join -t\# - file3 -j 1 |\
    cut -d\# --output-delimiter=\; -f2-5

Outputs

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3
1
votes

One more way using paste and awk:

paste -d"#" file1 file2 file3 | awk -F"#" '{print $2,$4,$6,$7}' OFS=";"
0
votes

Here's one in awk. It doesn't take missing data into consideration as you did not state in the question how it should be handled. It hashes all data into a hash and outputs it in the END:

$ awk '
BEGIN { FS="#"; OFS=";" }
{
    for(i=2;i<=NF;i++)
        a[$1]=a[$1] (a[$1]==""?"":OFS) $i
}
END {
    for(i in a)
        print a[i]
}' f1 f2 f3
a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3