How do I compare one column of a file with another column of another file using awk?

Question

I have two files like below:

file1.txt

2018-03-14 13:23:00 CID [72883359]
2018-03-14 13:23:00 CID [275507537]
2018-03-14 13:23:00 CID [275507539]
2018-03-14 13:23:00 CID [207101094]
2018-03-14 13:23:00 CID [141289821]

and file2.txt

2018-03-14 13:23:00 CID [207101072]
2018-03-14 13:23:00 CID [275507524]
2018-03-14 13:23:00 CID [141289788]
2018-03-14 13:23:00 CID [72883352]
2018-03-14 13:23:01 CID [72883359]
2018-03-14 13:23:00 CID [275507532]

I need to compare 4th colum of first file with 4th colum of 2nd file. I am using below command:

awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt

Its output is like below.

2018-03-14 13:23:00 CID [72883359] 2018-03-14 13:23:01

Above command works properly , but problem is file1 and file2 are huge and has some 20k lines and hence above command is taking time.

I want if a match is found , than it should skip the remaining column and go for next, means some kind of break statement . Please help.

Below is my script.

#!/bin/sh

cron=1;

for((j = $cron; j >= 1; j--))
do
    d1=`date -d "$date1  $j min ago" +%Y-%m-%d`
    d2=`date -d 'tomorrow' '+%Y-%m-%d'`
    
    t1=`date -d "$date1  2 min ago" +%R`
    t2=`date -d "$date1  1 min ago" +%R`
    t3=`date --date="0min" +%R`
done


cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" |  grep 'SRIR' | awk -F ' ' '{print $1,$2,$4,$5}'>file1.txt


cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" | grep 'SRIC' | awk -F ' ' '{print $1,$2,$4,$5}'>file2.txt


awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt

cat file3.txt | while read LINE
do
    f1=`echo $LINE | cut -f 1 -d " "`
    f2=`echo $LINE | cut -f 2 -d " "`
    
    String1=$f1" "$f2
    
    f3=`echo $LINE | cut -f 5 -d " "`
    f4=`echo $LINE | cut -f 6 -d " "`
    
    String2=$f3" "$f4
    
    
    f5=`echo $LINE | cut -f 3 -d " "`
    f6=`echo $LINE | cut -f 4 -d " "`
    
    String3=$f5" "$f6
    
    StartDate=$(date -u -d "$String1" +"%s")
    FinalDate=$(date -u -d "$String2" +"%s")
    echo "Diff for $String3 :" `date -u -d "0 $FinalDate sec - $StartDate sec" +"%H:%M:%S"` >final_output.txt
done

final_output.txt will be

Diff for CID [142298410] : 00:00:01
Diff for CID [273089511] : 00:00:00
Diff for CID [273089515] : 00:00:00
Diff for CID [138871787] : 00:00:00
Diff for CID [273089521] : 00:00:00
Diff for CID [208877371] : 00:00:00
Diff for CID [138871793] : 00:00:00
Diff for CID [138871803] : 00:00:00
Diff for CID [273089526] : 00:00:00
Diff for CID [273089545] : 00:00:00
Diff for CID [208877406] : 00:00:02
Diff for CID [208877409] : 00:00:01
Diff for CID [138871826] : 00:00:00
Diff for CID [74659680] : 00:00:00

could you explain skip the remaining colum and go for next? it is not clear... regarding the cmd you've tried, looks the best possible for this case.. — Sundeep
skip means , if amatch is found , then no need to traverse whole colum of file2. If a match is found than read another value from 4th colum of file1 and find it in 4th colum of file2. So that script can complete its work fast. — sunil.tanwar
sorry I still don't understand.. the script is reading each line of both files only once... — Sundeep

RavinderSingh13 RavinderSingh13 · Accepted Answer · 2018-03-14T09:04:15

Could you please try following awk and let me know if this helps you.

awk 'FNR==NR{a[$4]=$0;next} ($4 in a){print a[$4],$1,$2}' file1.txt  file2.txt

How do I compare one column of a file with another column of another file using awk?

3 Answers