I have two files:
File1:
chr1 100736194 46 0.731 + 100735713 100736636
chr1 100736194 49 0.879 + 100735723 100736646
chr1 100736196 54 0.952 + 100735753 100736666
File2:
chr1 100735713 100736636 + BMCHAS
chr1 100735723 100736646 + ATCGSG
chr1 100735753 100736666 + BDUSUS
I want to join file 1 and 2 based on
- file1's column 6 and 7, and
- file2's column 2 and 3
to output something like:
chr1 100736194 46 0.731 + 100735713 100736636 BMCHAS
chr1 100736194 49 0.879 + 100735723 100736646 ATCGSG
chr1 100736196 54 0.952 + 100735753 100736666 BDUSUS
I don't know how to use join for more than a single column and I looked up an awk solution and have been trying to implement that but I think I am not right somewhere, the command that I am using is: awk 'NR==FNR{a[$1,$2]=$5;next}{$3=a[$7,$8];print}' OFS='\t' file2 file1,
Also, it is important to match not one, but two columns from each file, because the single columns are not unique enough. This means I cannot use single-column matching solutions such as only matching file1's column 6 with file2's column 2 only, it must use the multiple specified columns above.
Thanks in advance.