1
votes

Could some one please help to compare two files, i have used the below command but i couldnt succeed on it,

awk -F, 'NR == FNR {a[$1,$2]; next} (($1,$2) in a )' temp1.dat temp2.dat

Here is my need, need to compare first two fields in the below two dat files and merge the result as expected in file3(first field, second field, 3 field of temp1.dat, 3 field of temp2.dat)

File1:temp1.dat

A, AB,100
B,BB,200
C,CC,300

File2:temp2.dat

A,AB,10
C,CC,30
D,DF, 4

File3 :output

A, AB,100,10
C,CC,300,30
2
why not just diff file1 file2? - user529758
i need file3 only matching column H2CO3. - jcrshankar
Use the join command. - Amit Naidu
For future readers and @AmitNaidu -- the join command is insufficient because the criteria state that two columns must match. Of course sed could be used to combine the key columns first, then, after sorting each file on the new combined key column, join would be sufficient (and a final sed filter could separate the joined columns again). Perhaps join would be more efficient for larger files, especially if they are already sorted on both columns. - Greg A. Woods

2 Answers

3
votes

awk -F, 'BEGIN{OFS=","}FNR==NR{a[$1$2]=$3;next}($1$2 in a && $3=$3","a[$1$2])' file2 file1

tested below:

> cat file1
A,AB,100
B,BB,200
C,CC,300
> cat file2
A,AB,10
C,CC,30
D,DF,4
> awk -F, 'BEGIN{OFS=","}FNR==NR{a[$1$2]=$3;next}($1$2 in a && $3=$3","a[$1$2])' file2 file1
A,AB,100,10
C,CC,300,30
> 
  • FNR==NR{a[$1$2]=$3;next} is applied for the first file file2
  • it says untill FNR==NR execute teh block of code.
  • FNR=line number of the current file
  • NR= line number of the total lines of two files.
  • so after the above statement an associative array with index as $1$2 and value as $3.
  • Now ($1$2 in a && $3=$3","a[$1$2]) this executes for FNR!=NR.where in it checks for index $1$2 exists as an index in the array and then teh second condition is changing the 3rd field of file1 to $3=$3","a[$1$2]so now $0 contains the common($1$2) lines which changed 3rd field in them.

similar logic has to be written for four files also.

1
votes

Try:

awk -F, '{i=$1 SUBSEP $2} NR==FNR{A[i]=$3; next} i in A{print $0,A[i]}' file2 file1