I have two files like below:
file1.txt
2018-03-14 13:23:00 CID [72883359]
2018-03-14 13:23:00 CID [275507537]
2018-03-14 13:23:00 CID [275507539]
2018-03-14 13:23:00 CID [207101094]
2018-03-14 13:23:00 CID [141289821]
and file2.txt
2018-03-14 13:23:00 CID [207101072]
2018-03-14 13:23:00 CID [275507524]
2018-03-14 13:23:00 CID [141289788]
2018-03-14 13:23:00 CID [72883352]
2018-03-14 13:23:01 CID [72883359]
2018-03-14 13:23:00 CID [275507532]
I need to compare 4th colum of first file with 4th colum of 2nd file. I am using below command:
awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt
Its output is like below.
2018-03-14 13:23:00 CID [72883359] 2018-03-14 13:23:01
Above command works properly , but problem is file1 and file2 are huge and has some 20k lines and hence above command is taking time.
I want if a match is found , than it should skip the remaining column and go for next, means some kind of break statement . Please help.
Below is my script.
#!/bin/sh
cron=1;
for((j = $cron; j >= 1; j--))
do
d1=`date -d "$date1 $j min ago" +%Y-%m-%d`
d2=`date -d 'tomorrow' '+%Y-%m-%d'`
t1=`date -d "$date1 2 min ago" +%R`
t2=`date -d "$date1 1 min ago" +%R`
t3=`date --date="0min" +%R`
done
cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" | grep 'SRIR' | awk -F ' ' '{print $1,$2,$4,$5}'>file1.txt
cat /prd/firewall/logs/lwsg_event.log | egrep "$d1|$d2" | egrep "$t1|$t2|$t3" | grep 'SRIC' | awk -F ' ' '{print $1,$2,$4,$5}'>file2.txt
awk 'FNR==NR{a[$4]=$1" "$2" "$3; next} ($4 in a) {print a[$4],$4,$1,$2}' file1.txt file2.txt>file3.txt
cat file3.txt | while read LINE
do
f1=`echo $LINE | cut -f 1 -d " "`
f2=`echo $LINE | cut -f 2 -d " "`
String1=$f1" "$f2
f3=`echo $LINE | cut -f 5 -d " "`
f4=`echo $LINE | cut -f 6 -d " "`
String2=$f3" "$f4
f5=`echo $LINE | cut -f 3 -d " "`
f6=`echo $LINE | cut -f 4 -d " "`
String3=$f5" "$f6
StartDate=$(date -u -d "$String1" +"%s")
FinalDate=$(date -u -d "$String2" +"%s")
echo "Diff for $String3 :" `date -u -d "0 $FinalDate sec - $StartDate sec" +"%H:%M:%S"` >final_output.txt
done
final_output.txt
will be
Diff for CID [142298410] : 00:00:01
Diff for CID [273089511] : 00:00:00
Diff for CID [273089515] : 00:00:00
Diff for CID [138871787] : 00:00:00
Diff for CID [273089521] : 00:00:00
Diff for CID [208877371] : 00:00:00
Diff for CID [138871793] : 00:00:00
Diff for CID [138871803] : 00:00:00
Diff for CID [273089526] : 00:00:00
Diff for CID [273089545] : 00:00:00
Diff for CID [208877406] : 00:00:02
Diff for CID [208877409] : 00:00:01
Diff for CID [138871826] : 00:00:00
Diff for CID [74659680] : 00:00:00
skip the remaining colum and go for next
? it is not clear... regarding the cmd you've tried, looks the best possible for this case.. – Sundeep