Compare multiple Columns and Append the result into another file

Question

I have two files file1 and file2, Both the files have 5 columns. I want to compare first 4 columns of file1 with file2.

If they are equal, need to compare the 5th column. If 5th column values are different, need to print the file1's 5th column as file2's 6th column.

I have used below awk to compare two columns in two different files, but how to compare multiple columns and append the particular column in another file if matches found?

awk -F, 'NR==FNR{_1[$1]++;next}!_1[$1]'

file1:

111,item1,garde1,wing1,maingroup
123,item3,grade5,wing10,topcat
132,item2,grade3,wing7,middlecat
134,item2,grade3,wing7,middlecat
177,item8,gradeA,wing11,lowcat

file2:

111,item1,garde1,wing1,maingroup
123,item3,grade5,wing10,lowcat
132,item3,grade3,wing7,middlecat
126,item2,grade3,wing7,maingroup
177,item8,gradeA,wing11,lowcat

Desired output:

123,item3,grade5,wing10,lowcat,topcat

When writing software, it's generally accepted that it's a good idea to give variables meaningful names. _1 would not qualify as such! — Ed Morton

jas jas · Accepted Answer · 2015-01-20T10:34:31

Awk can simulate multidimensional arrays by sequencing the indices. Underneath the indices are concatenated using the built-in SUBSEP variable as a separator:

 $ awk -F, -v OFS=, 'NR==FNR { a[$1,$2,$3,$4]=$5; next } a[$1,$2,$3,$4] && a[$1,$2,$3,$4] != $5 { print $0,a[$1,$2,$3,$4] }' file1.txt file2.txt

123,item3,grade5,wing10,lowcat,topcat

awk -F, -v OFS=,

Set both input and output separators to ,

NR==FNR { a[$1,$2,$3,$4]=$5; next }

Create an associative array from the first file relating the first four fields of each line to the fifth. When using a comma-separated list of values as an index, awk actually concatenates them using the value of the built-in SUBSEP variable as a separator. This is awk's way of simulating multidimensional arrays with a single subscript. You can set SUBSEP to any value you like but the default, which is a non-printing character unlikely to appear in the data, is usually fine. (You can also just do the trick yourself, something like a[$1 "|" $2 "|" $3 "|" $4], assuming you know that your data contains no vertical bars.)

a[$1,$2,$3,$4] && a[$1,$2,$3,$4] != $5 { print $0,a[$1,$2,$3,$4] }

Arriving here, we know we are looking at the second file. If the first four fields were found in the first file, and the $5 from the first file is different than the $5 in the second, print the line from the second file followed by the $5 from the first. (I am assuming here that no $5 from the first file will have a value that evaluates to false, such as 0 or empty.)

Compare multiple Columns and Append the result into another file

2 Answers