I would like to look for patterns in column 1 of File1 in File2, then print second column of File1 next to File2:
File1 (two columns tab-separated):
APBW lung
APCA non virulent
ABKM lung
APBX lung
KK020 -
APBZ non virulent
AOSU lung
APBY non virulent
APBV joint; lung; CNS
CP001321 virulent
APBT virulent
APBU non-virulent
APCB moderadamente virulenta (nose)
CP005384 -
File2 (two columns tab-separated):
HS372_00243 gi|219690483|gb|CP001321.1|
HS372_00436 gi|529264994|gb|APBX01000055.1|
HS372_00445 gi|529256455|gb|APBT01000061.1|
HS372_00544 gi|529259149|gb|APBV01000035.1|
HS372_00545 gi|529259149|gb|APBV01000035.1|
HS372_00546 gi|529259149|gb|APBV01000035.1|
Desired output (three columns tab-separated):
HS372_00243 gi|219690483|gb|CP001321.1| virulent
HS372_00436 gi|529264994|gb|APBX01000055.1| lung
HS372_00445 gi|529256455|gb|APBT01000061.1| virulent
HS372_00544 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00545 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00546 gi|529259149|gb|APBV01000035.1| jointlungCNS
Provisional bash code (not working), but open to other languages:
while read vl; do grep "$vl" File2 ; done < File1
Also tried with awk (is not working because it seems it's looking for an exact match and my string in File2 is surrounded by other things):
awk 'BEGIN { FS = OFS = "\t" } FNR==NR{a[$1]=$0;next}($1 in a){print a[$1],$2,$3}' File1 File2
Thanks, Bernardo
APBX01000055is found in file 2, becauseAPBX01000055is not in file1. - fedorqui 'SO stop harming'CP001321? - fedorqui 'SO stop harming'