I'm trying to compare column 1 from file1 and column 3 from file 2, if they match then print the first column from file1 and the two first columns from file2.
here's a sample from each file:
file1
Cre01.g000100
Cre01.g000500
Cre01.g000650
file2
chromosome_1 71569 |655|Cre01.g000500|protein_coding|CODING|PAC:26902937|1|1)
chromosome_1 93952 |765|Cre01.g000650|protein_coding|CODING|PAC:26903448|11|1)
chromosome_1 99034 |1027|Cre01.g000100 |protein_coding|CODING|PAC:26903318|9|1)
desired output
Cre01.g000100 chromosome_1 99034
Cre01.g000500 chromosome_1 71569
Cre01.g000650 chromosome_1 93952
I've been looking at various threads that are somewhat similar, but I can't seem to get it to print the columns from both files. Here are some links that are somewhat related:
awk compare 2 files, 2 fields different order in the file, print or merge match and non match lines
Obtain patterns from a file, compare to a column of another file, print matching lines, using awk
awk compare columns from two files, impute values of another column
Obtain patterns in one file from another using ack or awk or better way than grep?
Awk - combine the data from 2 files and print to 3rd file if keys matched
I feel like I should have been able to figure it out based on these threads, but it's been two days that I've been trying different variations of the codes and I haven't gotten anywhere. Here is some code that I've tried using on my files:
awk 'FNR==NR{a[$3]=$1;next;}{print $0 ($3 in a ? a[$3]:"NA")}' file1 file2
awk 'NR==FNR{ a[$1]; next} ($3 in a) {print $1 $2 a[$1]}' file1 file2
awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' file1 file2
I know i have to create a temp matrix that contains the first column of file1 (or the 3rd column of file2) then compare it to the other file. If there is a match, then print first column from file1 and column 1 and column 2 from file 2.
Thanks for the help!