Merge two files by one column AWK

Question

I would like to merge file1 4th column with file2 1st column with awk and I would like to print 2nd column from file $1. If more than one match (could be more than 100), print it separated by comma.

FILE1:

alo descrip 1  PAPA
alo descrip 2  LOPA
alo descrip 3  REP
alo descrip 4  SEPO
dlo sapro   31 REP
dlo sapro   35 PAPA

FILE2:

PAPA klob trop
PAPA kopo topo
HOJ  sasa laso
REP  deso rez
SEPO raz  ghul
REP  kok  loko

OUTPUT:

PAPA klob trop descrip,sapro
PAPA kopo topo descrip,sapro
HOJ  sasa laso NA
REP  deso rez  descrip,sapro
SEPO raz  ghul descrip
REP  kok  loko descrip,sapro

I tried:

awk -v FILE_A="FILE1" -v OFS="\t" 'BEGIN { while ( ( getline < FILE_A ) > 0 ) { VAL = $0 ; sub( /^[^ ]+ /, "", VAL ) ; DICT[ $1 ] = VAL } } { print $0, DICT[ $4 ] }' FILE2

but it doesn't work.

Based on what you are asking, I think this might be what you need. — Aditya Vartak

RavinderSingh13 RavinderSingh13 · Accepted Answer · 2020-02-21T09:27:54

Could you please try following.

awk '
FNR==NR{
  a[$NF]=(a[$NF]?a[$NF] ",":"")$2
  next
}
{
  printf("%s %s\n",$0,($1 in a)?a[$1]:"NA")
}
'  Input_file1  Input_file2

Explanation: Adding detailed explanation for above code.

awk '                                          ##Starting awk program fro here.
FNR==NR{                                       ##Checking condition FNR==NR whioh will be TRUE when Input_file1 is being read.
  a[$NF]=(a[$NF]?a[$NF] ",":"")$2              ##Creating arra a with index $NF, its value is keep appending to its own value with $2 of current line.
  next                                         ##next will skip all further lines from here.
}
{
  printf("%s %s\n",$0,($1 in a)?a[$1]:"NA")    ##Printing current line then either value of array or NA depending upon if condition satisfies.
}
'  Input_file1 Input_file2                     ##Mentioning Input_file names here.

Merge two files by one column AWK

2 Answers