I have 2 files say file1.txt & file2.txt.
file1.txt:
2017042100000000010000000943678177700000000819900000026572
2017042100000000010000000943678177700000003500000000026581
2017042100000000010000000943678177700000013450000000026591
2017042100000000010000000943678177700000011500000000026601
2017042100000000010000000943678177700000010000000000026611
2017042100000000010000000943678177700000010000000000026622
2017042100000000010000000943678177700000012855000000026632
file2.txt
20170421,0000000001,00000009436781777,000000008199,0000002657,F,3,img_F1_1.tiff
20170421,0000000001,00000009436781777,000000008199,0000002657,B,3,img_F1_1.tiff
20170421,0000000001,00000009436781777,000000035000,0000002658,F,8,img_F1_2.tiff
20170421,0000000001,00000009436781777,000000134500,0000002659,F,1,img_F1_3.tiff
20170421,0000000001,00000009436781777,000000115000,0000002660,F,2,img_F1_4.tiff
20170421,0000000001,00000009436781777,000000100000,0000002661,F,1,img_F1_5.tiff
20170421,0000000001,00000009436781777,000000100000,0000002662,F,8,img_F1_6.tiff
I have to compare entries of file1.txt(except last character) to first 5 columns of file2.txt. If it matches then I have to store entries of file2.txt to another file say matched.txt. If it doesn't then I have to store entries of file1.txt in another file say unmatched.txt. which works for me with below commands.
awk -F',' 'FILENAME=="file1.txt" {A[substr($1, 1, length($1)-1)]=substr($1, 1, length($1)-1)} FILENAME=="file2.txt"{if(A[$1$2$3$4$5]){print}}' file1.txt file2.txt > matched.txt
Now, I have one another problem:
If entries of file1.txt(except last character) matches to first 5 columns of file2.txt then it has to check last character of file1.txt(that will be either 1 or 2). if last digit/character is 2 then it has to search 2 same entries(first 5 columns) in file2.txt where 6th columns must have 'F' for first entries and 'B' for second entries. eg: file1.txt
2017042100000000010000000943678177700000000819900000026572
Here last digit is 2, then we must find 2 entries in file2.txt
file2.txt
20170421,0000000001,00000009436781777,000000008199,0000002657,F,3,img_F1_1.tiff
20170421,0000000001,00000009436781777,000000008199,0000002657,B,3,img_F1_1.tiff
which has both entries 'F' & 'B'.
If we find LESS THAN 2 entries then we have to store missing entries into file say missing.txt. My command is working for 2 entries or 0 entries but for only one entry not working.
Expected Output:
missing.txt
2017042100000000010000000943678177700000010000000000026622 'B'
2017042100000000010000000943678177700000012855000000026632 'F'
2017042100000000010000000943678177700000012855000000026632 'B'