0
votes

I have a data frame like this:

df
col1    col2     col3    col4
 A       B         C       12
 A       B         C       8
 A       B         C       10
 P       Q         R       12
 P       Q         R       11
 K       L         S       1
 K       L         S       15
 U       V         R       20 

I want to get those rows where col4 value is maximum for col3 values for each col1 and col2 combinations

for example the result I am looking for is

col1    col2    col3    col4
 A        B       C       12
 P        Q       R       12
 K        L       S       15
 U        V       R       20

how to do it in most efficient way using pandas ?

3
df.groupby(['col1', 'col2', 'col3']).max()Simon Rogers
Unless you are sure that col1 and col2 are unique for each col3, the robust way is to group by all other columns you want to considere: df.groupby(['col1', 'col2', 'col3']).max().Serge Ballesta

3 Answers

0
votes

You can us the groupby function with max() :

df = pd.DataFrame({'col1' : ['A','A','A','P','P'], 'col2' : ['B','B','B','Q','Q'],
                   'col3':['C','C','C','R','R'], 'col4':[12,8,10,12,11]})
df.groupby(['col1', 'col2']).max()

Out :

col1    col2    col3    col4    
A   B   C   12
P   Q   R   12
0
votes

Try this:

>>> import pandas as pd
>>> df = pd.read_csv("t.csv")
>>> df
  col1 col2  col3  col4
0    A    B     C    12
1    A    B     C     8
2    A    B     C    10
3    P    Q     R    12
4    P    Q     R    11
5    K    L     S     1
6    K    L     S    15
7    U    V     R    20
>>> df.groupby(['col1']).max()
     col2  col3  col4
col1
A       B     C    12
K       L     S    15
P       Q     R    12
U       V     R    20
0
votes

You need to use groupby:

import pandas as pd

# setup test data
data = {'col1': ['A', 'A', 'A', 'P', 'P', 'K', 'K', 'U'], 'col2': ['B', 'B', 'B', 'Q', 'Q', 'L', 'L', 'V'],
    'col3': ['C', 'C', 'C', 'R', 'R', 'S', 'S', 'R'], 'col4': [12, 8, 10, 12,11,1,15,20]}
data = pd.DataFrame(data=data)

# get max values
out_data = data.groupby(['col1', 'col2', 'col3']).max()

Output:

col1 col2 col3  col4    
A    B    C       12
K    L    S       15
P    Q    R       12
U    V    R       20