5
votes

I'm trying to read two separate files by awk and parse the second one into output file.

file1 contains numbers:

1
2
5
7
10

file2 contains a header (number of fields <3) and data values in columns (25 columns)

_rlnNrOfSignificantSamples #24 
_rlnMaxValueProbDistribution #25 
300.000000 25425.970703 25000.669922     6.050000     2.000000    56.000000     0.277790     79096.000000     0.100000 000001@Particles/Micrographs/006_particles.mrcs   453.000000   604.000000     1.000000     0.859382 Micrographs/006.mrc            1    -3.469177     -3.469177     0.000000     0.000000   -82.345885           23  9475.876495            1     0.988689
300.000000 25425.970703 25000.669922     6.050000     2.000000    56.000000     0.277790 79096.000000     0.100000 000002@Particles/Micrographs/006_particles.mrcs   431.000000   428.000000     1.000000     0.806442 Micrographs/006.mrc            1    -1.469177    -3.469177     0.000000     0.000000    87.654115           22  9412.959278            1     1.000000

I want to read numbers from file1 into array, then:

  1. print header from file2
  2. print lines from file2, if values in field $22 are NOT in array (in example earlier its values are 23 and 22)

After one day of struggling I came up with the following:

#!/bin/bash    
FieldNum=22

awk -v f=$FieldNum 'FNR==NR{num[$1]; next}
    {
        # print the header of file2
        if(NF < 3) {print > "output"}
        # check lines after header  
        else {if (f in num) {} else {print >> "output"}}
    }' $file1 $file2 

But it turns out to print all the lines from file2, so the array checking doesn't work. Could you please spot my mistake?

1

1 Answers

13
votes

this one-liner should do what you want:

 awk 'NR==FNR{a[$0];next}NF<3||!($22 in a)' file1 file2

your problem is , you have var f, which is a number, I guess it is an index of the column.

but if you check your code, you used the f as a value, check if the f in the array, instead of checking the $f

That is, if you gave f=22, for each line in file2, you check if the constant 22 in array. So the output would be either all lines in file2 or only the headers in file2, it depends on if the constant 22 in your file1. :)