3
votes

I want to compare the second columns from two files. If there is a match, print the corresponding value of the third column from the second file to the first file. If no match is found, fill with "NA"

File 1

1      rs1    AA    10
1      rs2    DD    20
1      rs3    EE    30
1      rs4    RR    40


File 2

1      rs1    Pascal
1      rs4    Albinoni


Desired output

1      rs1    AA    10    Pascal
1      rs2    DD    20    NA
1      rs3    EE    30    NA
1      rs4    RR    40    Albinoni

I have used this code, but it only output the matches:

awk 'NR==FNR { a[$1]=$2; next} $1 in a {print $0, a[$1]}' File2 File1

Thank you

3

3 Answers

3
votes
awk 'FNR==NR{a[$2]=$3;next}{print $0,a[$2]?a[$2]:"NA"}' file2 file1

tested below:

> cat temp1
1      rs1    AA    10
1      rs2    DD    20
1      rs3    EE    30
1      rs4    RR    40
> cat temp2
1      rs1    Pascal
1      rs4    Albinoni
> awk 'FNR==NR{a[$2]=$3;next}{print $0,a[$2]?a[$2]:"NA"}' temp2 temp1
1      rs1    AA    10 Pascal
1      rs2    DD    20 NA
1      rs3    EE    30 NA
1      rs4    RR    40 Albinoni
> 
3
votes

No exactly what you want, but using small tools is always nice. You can use join:

$ join -1 2 -2 2 -a1 file1 file2
rs1 1 AA 10 1 Pascal
rs2 1 DD 20
rs3 1 EE 30
rs4 1 RR 40 1 Albinoni

Explanation:

  • -1 2: use second field of file1 as join key
  • -2 2: use seceond field of file2 as join key
  • -a1: left outer join on file1.
2
votes
awk 'NR==FNR{a[$2]=$3;next;}{print $0 "    " ($2 in a ? a[$2] : "NA")}' file2 file1

Output:

1      rs1    AA    10    Pascal
1      rs2    DD    20    NA
1      rs3    EE    30    NA
1      rs4    RR    40    Albinoni