I have two txt files (tab delimited) I'm trying to compare with awk. I want to compare column 1 of file1 to column 5 of file2 and any time a string from file1 column 1 matches file2 column 5 I want to print (eventually to a new txt file) the entire row of file2. Theres about 14000 lines in file 1 with just one column and 40000 rows with 6 columns in file2. Below are some test files I've been using to try to figure out how to use awk to get my desired output.
file1
AAGAB
AAK1
AAMDC
AAMP
AAR2
AARD
AARS
file2
chr22 38035683 38052050 NM_018957 AAMDC +
chrX 47001614 47004609 NM_019056 AAMP -
chr9 21994789 22077889 NR_047539 AAR2 +
chr16 20370491 20416033 NM_174924 AARD -
chr1 210111518 210337633 NM_001146262 AARS +
chrX 30233674 30238206 NM_002364 MAGEB2 +
chrX 30261847 30270155 NM_002363 MAGEB1 +
I've tried adopting some other awk questions with similar desired output from the abundance of similar awk questions on SO:
awk -F '\t' 'NR==FNR{c[$1]++;next}c[$5]' file1 file2
But I only get one match for the AARS row, even though there are other matches in the test files:
chr1 210111518 210337633 NM_001146262 AARS +
The output I am looking for is:
chr14 94463615 94473898 NR_024182 AAGAB +
chr10 74033676 74035797 NM_019058 AAK1 +
chr22 38035683 38052050 NM_018957 AAMDC +
chrX 47001614 47004609 NM_019056 AAMP -
chr9 21994789 22077889 NR_047539 AAR2 +
chr16 20370491 20416033 NM_174924 AARD -
chr1 210111518 210337633 NM_001146262 AARS +
I've tried adopting a couple other posts on stackoverflow but keep getting the same output of just 1 match even though multiple others exist:
awk to match file1 with file2 and output matches
awk -F '\t' 'NR==FNR{c[$1]; next} ($5 in c)' file1 file2
Find rows with the same value in a column in two files
awk -F '\t' 'NR==FNR{A[$1];next}$5 in A' file1 file2
Comparing two columns in two files using awk
awk -F '\t' 'FNR==NR {a[$1];next} {for (i in a) if ($5~i) print}' file1 file2
Can someone help me understand where my awk line is going wrong?
I'm currently using GNU Awk 4.2.1, API: 2.0
-F'\t'
– karakfa