0
votes

I'd like to compare two files and delete lines in file1 if they contain a pattern found anywhere in file2. I did some searching and the closest answers I've been able to find were how to delete lines that appear in another file.

I'd like a simple grep, awk, sed, etc one-liner if possible. I'm matching on IP addresses, as shown below.

file1

10.10.50.1 00:00:10:23 0000.0012.3456 Vlan1
10.10.50.2 00:00:12:34 1234.56AB.CDEF Vlan2
10.10.50.3 00:00:23:10 ABCD.EF12.345 Vlan3billion

file2

these-are some_words 10.10.50.2 andmaybe some-other words
theseare somewords 10.10.50.99 and-maybe some_other words

Expected output:

10.10.50.1 00:00:10:23 0000.0012.3456 Vlan1
10.10.50.3 00:00:23:10 ABCD.EF12.345 Vlan3billion
3
What's a pattern? A fixed string (sequence of non-blanks taken litterally), a shell pattern (and then, which shell), a regex? This is rather important. - AlexP
Define "pattern found anywhere". Are e som or .2 andma possible patterns? Or could you normalize file2 to one token per line? (In which case the rest should be trivial.) - tripleee
@AlexP I'm completely new to this so I don't know the correct answer to that question. In my case, it's an IP address. Should I reword the question? - Rhythmic Undulations
So your real question then is "how do I reduce file2 to just IP addresses, one per line" and you can take it from there? - tripleee
@tripleee I need to remove lines from file1 if they contain IP addresses in file2. - Rhythmic Undulations

3 Answers

1
votes

If I understand correctly, you want to exclude from the first file lines that would match any IP address in the second file.

This simple and admittedly a bit lazy solution might be good enough for your purpose:

grep -v file1 -Fwf <(awk '{ print $3 }' file2)

The Awk extracts the 3rd column with IP addresses, and grep will use those as fixed patterns (-F) and only match complete words (-w).

If the IP address is not always the 3rd column, then you could extract them by using pattern matching with grep, as @tripleee suggested:

grep -v file1 -Fwf <(grep -owE '[1-9][0-9](\.[0-9]{1,3}){3}' file2)
0
votes

awk to the rescue!

$ awk 'NR==FNR{a[$3];next} !($1 in a)' file2 file1

10.10.50.1 00:00:10:23 0000.0012.3456 Vlan1
10.10.50.3 00:00:23:10 ABCD.EF12.345 Vlan3billion
0
votes

More awk ... at the core snaffled from karafka ..

$ awk 'NR==FNR{a[gensub(/^.* (([0-9]{1,3}\.){3}[0-9]{1,3}) .*$/,"\\1",1,$0)];next} !($1 in a)' file2 file1
10.10.50.1 00:00:10:23 0000.0012.3456 Vlan1
10.10.50.3 00:00:23:10 ABCD.EF12.345 Vlan3billion