i have a csv file with multiple columns. Some might have duplicates over the 4th col (col4).
I need to delete the whole row where the duplicates occurs and keep only 1 row. The decision of this row is made by getting highest value from col1.
Below is an example:
col1,col2,col3,col4
1,x,a,123
2,y,b,123
3,y,b,123
1,z ,c,999
Duplicate is found in row 1 and row2 and row3, only third row should be kept because col1(row3) > col1(row2) > col1(row1).
For now this code delete duplicates in col4 without looking at col1
awk '!seen[$4]++' myfile.csv
I would like to add a condition to check col1 for each duplicates and delete the ones with lowest value in col1 and keep the row with highest value n col1
Output should be:
col1,col2,col3,col4
3,y,b,123
1,z,c,999
Thank you!