2
votes

relatively new to AWK here. Wanting to compare two files. First two columns are to match in order to compare the 3rd column. 3rd column needs to be 100 larger in order to print that line from the second file. Some data may exist in one file but not in the other. I don't think it matters to AWK, but spaceing isn't very consistent for delimination. Here is a small snipit.

File1

USTL_WR_DATA      MCASYNC@L      -104      -102      -43      -46
USTL_WR_DATA         SMC@L      171      166       67       65
TC_MCA_GCKN     SMC@L   -100    -100    0   0
WDF_ARRAY_DW0(0)        DCDC@L      297      297      101      105
WDF_ARRAY_DW0(0)    MCASYNC@L   300 300 50  50
WDF_ARRAY_DW0(0)        MCMC@L       12       11       34       31

File2

TC_MCA_GCKN     SMC@L   200 200 0   0
WDF_ARRAY_DW0(0)        DCDC@L      842      867      271      270
WDF_ARRAY_DW0(0)    MCASYNC@L   300 300 50  50
WDF_ARRAY_DW0(1)    SMCw@L  300 300 50  50
WDF_ARRAY_DW0(2)        DCDC@L      896      927      279      286
WDF_ARRAY_DW0(2)    MCASYNC@L   300 300 50  50

Output

TC_MCA_GCKN     SMC@L   200 200 0   0
WDF_ARRAY_DW0(0)        DCDC@L      842      867      271      270

Here is my code. Not working. Not sure why.

awk 'NR==FNR{a[$1,$2];b[$3];next} (($1,$2) in a) && ($3> (b[$1]+100))' File1 File2

NR==FNR{a[$1,$2];b[$3];next} makes two arrays from the first file (I had issues making it one), the first two columns go in a to confirm we're comparing the same thing, and the third column I'm using to compare since late mode high seems like a reasonable assert to compare

(($1,$2) in a) makes sure first two columns in second file are the ones we're comparing to.

&& ($3> (b[$1]+100))' I think this is what's giving the issue. Supposed to see if second file column 3 is 100 or more greater than first file column 3 (first and only column in array b)

1
Good first question! You have input and output files; and tried code with your reasoning...karakfa

1 Answers

3
votes

you need to key the value with the same ($1,$2) combination. Since we don't use a for any other purposes just store the value there.

$ awk 'NR==FNR {a[$1,$2]=$3; next} 
       ($1,$2) in a && $3>a[$1,$2]+100' file1 file2

TC_MCA_GCKN     SMC@L   200 200 0   0
WDF_ARRAY_DW0(0)        DCDC@L      842      867      271      270