2
votes

I have a tab separated file with 5 columns. I want to print the entire line when the value is maximum in column 3 for entries listed in column 4. Example input:

Col1 Col2 Col3 Col4 Col5
123  456   22  NM01  +  
675  564   21  NM01  +  
342  789   12  NM01  +  
231  452   56  NM02  +  
456  783   178 NM02  +

Expected outcome:

123  456   22  NM01  +  
456  783   178 NM02  +

Basically, print out the line when Col3 has the max value for entries in col4.

I am trying to use unix commands. I am new to this and not able to find examples of how to look up the max value in col3 for each case of col4 (ex: NM01, NM02..)

1

1 Answers

3
votes

You can do this using two maps: 1 to track the maximum values per field $4, and one to track the complete lines having the maximum values per field $4:

awk '$3 > max[$4] { max[$4] = $3; m[$4] = $0 }
     END { for (i in m) { print m[i] } }' file.txt

This will output:

Col1 Col2 Col3 Col4 Col5
123  456   22  NM01  +
456  783   178 NM02  +

If you don't want the header line, you can add a NR > 1 condition.