How to find matching patterns between two text files and output to another file?

Question

I have two text files with different text organization. Both files contain few identical patterns (numbers) in the text. I'd like to find which patterns (numbers) are present in both files and write them to the output file.

file1.txt:

blablabla_25947.bkwjcnwelkcnwelckme

blablabla_111.bkwjcnwelkcnwelckme

blablabla_65155.bkwjcnwelkcnwelckme

blablabla_56412.bkwjcnwelkcnwelckme

file2.txt:

blablabla_647728.bkwjcnwelkcnwelck
kjwdhcwkejcwmekcjwhemckwejhcmwekch

blablabla_6387.bkwjcnwelkcnwelckme
wexkwhenqlciwuehnqweiugfnwekfiugew
wedhwnejchwenckhwqecmwequhcnkwjehc
owichjwmelcwqhemclekcelmkjcelkwejc

blablabla_59148.bkwjcnwelkcnwelckme
ecmwequhcnkwjehcowichjwmelcwqhemcle
kcelmkjcelkwejcwecawecwacewwAWWAXEG

blablabla_111.bkwjcnwelkcnwelckm
WESETRBRVSSCQEsfdveradassefwaefawecc

output_file.txt:

This is computationally a very hard problem, since you need to match "every possible length of string from every possible starting point". In the above, how many matches of b, bl, bla, blab, .. etc would you expect to get? Is there a "minimum length of match" you would want? I think there is no built in command that does exactly what you want; if you had the search strings on a line by themselves, it would be easy (using grep -F file1 file2), but you don't... Can you do anything to bound the problem better? — Floris
I had not appreciated that you only wanted to match the numbers... that does make it somewhat easier. — Floris

Chris Seymour Chris Seymour · Accepted Answer · 2013-02-11T15:14:47

How about:

$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.'
111

# Redirect to new file
$ egrep -o '_[0-9]+\.' file1 | grep -of - file2 | tr -d '_.' > file3

First grep gets all the digit strings (preceded by _ and preceding .) from file1 and this list is used to grep the matches in file2. The _ and . are stripped using tr.

How to find matching patterns between two text files and output to another file?

2 Answers