find same value in 2 differents column from differents files

1

votes

I got 2 files:

file1.txt:

1 A bla 9232
1 B tesfs 3049
1 C blof 4054
2 D dkeeez 3049
2 E eor 4042
3 F foaer 4024

file2.txt:

A
B
E

Expected output, file3.txt:

1 A bla 9232
1 B tesfs 3049
2 E eor 4042

The output is simply the line from the file1 that contains the same value in column 2 than in the file2.

In file2.txt, each line is unique, but you can have :

A
AA
AAee
B
...

I tried to used grep -Ff file2.txt file1.txt but there's still line in file3.txt that doesn't exist in file2.txt The solution can be in line or in a shell script, I tried to use "awk" and shell script, without result...

bashshellawkscriptinggrep

4

votes

You can use awk command:

awk 'FNR==NR{a[$1]; next} $2 in a' file2.txt file1.txt

In the first iteration we store all values from file2.txt into an array a. In the 2nd step white iterating file1.txt we check if column 2 is in array a and print it.

Output:

1 A bla 9232
1 B tesfs 3049
2 E eor 4042

1

votes

This is one thing that join is good for, provided your inputs are sorted (on field 2 for file1.txt, and on field 1 for file2.txt - your example shows sorted inputs, but if your real inputs aren't, you'll have to fix that before join will work):

join -1 2 -2 1 -o 1.1,1.2,1.3,1.4 file1.txt file2.txt

0

votes

I love the awk solution from anubbhava. Here is an alternate solution, using grep:

# Add word anchors before and after each word in file2.txt
sed 's/^/\\b/;s/$/\\b/' file2.txt > temp.txt  

grep -f temp.txt file1.txt
rm temp.txt

The file temp.txt would look like this:

\bA\b
\bB\b
\bE\b

Next, we would use that temp.txt file as the search terms and get the desired result.

0

votes

grep + awk version:

# This will grep the first column of file2.txt in file1.txt. 

grep "`awk '{print $1}' file2.txt`" file1.txt
1 A bla 9232
1 B tesfs 3049
2 E eor 4042

grep + cut version:

# This will grep the first column of file2.txt in file1.txt. 

grep "`cut -d' ' -f1 file2.txt`" file1.txt

1 A bla 9232
1 B tesfs 3049
2 E eor 4042

find same value in 2 differents column from differents files

4 Answers

grep + awk version:

grep + cut version: