Compare columns in two files and print the match values in specific columns

Question

In the following case. I will like to find values which match in: file1: columns 8 & 9 with file2: columns 2 & 3

If the values are exactly the same in both files, then print like the desired output file

file1

31429,36689,313212.5,2334362.5,31429,36679,31308,302412.50 2316512.50
31429,36701,313362.5,2334362.5,31429,36681,31311,2334363,31429
31429,36713,313512.5,2334362.5,31429,36719,31358,303312.50 2316512.50
31429,36749,313962.5,2334362.5,31429,36751,31398,2334362,31429
31429,36809,314712.5,2334362.5,31429,36803,31463,2334361,31429
31429,36821,314862.5,2334362.5,31429,36817,31481,2334363,31429

file2

3000135825 302412.50 2316512.50
3000135837 302562.50 2316512.50
3000135849 302712.50 2316512.50
3000135861 302862.50 2316512.50
3000135873 303012.50 2316512.50
3000135885 303162.50 2316512.50
3000135897 303312.50 2316512.50
3000135909 303462.50 2316512.50
3000135921 303612.50 2316512.50
3000135933 303762.50 2316512.50
3000135945 303912.50 2316512.50

output desired

3000135825 302412.50 2316512.50 3667931308 302412.50 2316512.50
3000135897 303312.50 2316512.50 3671931358 303312.50 2316512.50

I tried Using this command i got the results, BUT it takes a lot time as the file2 have 3 millions of lines and the code take too much time To be able to use the code, first I create a temporary file named tmp1 with columns 5,6,8,9 from file1

awk -F, '{print($5$6,$8,$9)}' file1 > tmp1 

awk 'FNR==NR{a[$2$3]=$0;next}{print $0,a[$2$3]?a[$2$3]:"NA"}' file2 tmp1

what is the length of file1? If much less than file2 you can cache file1 contents instead. — karakfa

karakfa karakfa · Accepted Answer · 2019-02-17T20:55:57

If file1 length much less than file2, you can cache file1 contents instead.

something like this (not tested)

$ awk -F, 'NR==FNR      {a[$8,$9]==$6$7; next}   # is $6$7 the key you want to print?
           ($2,$3) in a {print $1,$2,$3,a[$2,$3]}' file1 FS=' ' file2

since the values should match there is no need to print them again. Not sure what is the fourth value printed in the output, but if it's coming from file1, just replace with it.

Compare columns in two files and print the match values in specific columns

4 Answers