1
votes

I need to match columns 2 and 5 from file 1 against columns 2 and 4 from the second file. If match, return a new column in file 2 containing values from the 4th column of file 1.

file_1

10  43572716    43572935    22  AMP
10  43572716    43572935    26  DEL
18  48578898    48579056    71  DEL

File_2

10  43572716    43572935    DEL 1   RET -0,5    0,51    218,164 0,58    255,89  0,73    0,79    0,85
18  48578898    48579056    DEL 1   SMAD4   -0,9    0,23    8,70886 0,02    16,665  0,05    0,4 0,52

File_2 contains many more columns

I've tried the code below but it returns the whole file

awk 'FNR==NR{a[$2 FS $5]=$0;next} (($2 FS $4) in a) {print a[$2 FS $4],$0}' file_1.bed FILE_2.bed

Expected results are

10  43572716    43572935    DEL 1   RET -0,5    0,51    218,164 0,58    255,89  0,73    0,79    0,85    26
18  48578898    48579056    DEL 1   SMAD4   -0,9    0,23    8,70886 0,02    16,665  0,05    0,4 0,52 71

Can anyone help, please?

1

1 Answers

1
votes

You were close, could you please try following.

awk 'FNR==NR{a[$2,$5]=$4;next} ($2,$4) in a{print $0,a[$2,$4]}' Input_file1 Input_file2

Adding a non-one liner form of solution as follows:

awk '
FNR==NR{
  a[$2,$5]=$4
  next
}
($2,$4) in a{
  print $0,a[$2,$4]
}
' Input_file1  Input_file2


Why OP's code didn't work: OP was close in terms of logic of getting this done, only thing was while reading Input_file1 in spite of having a[$2 FS $5]=$0 it should be a[$2 FS $5]=$4 and while printing it on 2nd Input_fie reading it should be print $0,a[$2 FS $4]. Which I fixed it in my code above.