1
votes

I have two tables for mean and std, I would like to calculate the maximum over some columns for the mean table, And I would like to get corresponding rows for the std table.

mean_df = pd.read_csv(r'./csvs/mean.csv')

std_df = pd.read_csv(r'./csvs/std.csv')

#mean_df, std_df are of same size

grouped_df = mean_df.groupby(['alpha', 'beta'])

columns = ['val']
max_df = grouped_df[columns].agg(['max'])

# Here i want the corresponding std_max_df table for the max_df. i.e., for every max calculated from mean, i want the std of that max in a new table.

For example:

input mean_df is

alpha beta gamma val

1 2 3 100

4 6 8 200

1 2 9 400

4 6 7 500

3 5 8 600


input std_df is

alpha beta gamma val

1 2 3 300

4 6 8 500

1 2 9 100

4 6 7 700

3 5 8 900


output will be

alpha beta gamma max_mean_val corresp_std_val

1 2 9 400 100

4 6 7 500 700

3 5 8 600 900


1
What is your question exactly? Please read How do I ask a good question?. Particularly, the part with "Introduce the problem before you post any code". - darcamo
Can you look into it now? I have two tables basically, I calculate max over some columns in table1, i want corresponding rows in table 2 - charanReddy

1 Answers

1
votes

First merge DataFrames together by inner or left join and then get rows with maximal index values by DataFrameGroupBy.idxmax:

df = mean_df.merge(std_df, on=['alpha', 'beta','gamma'], 
                           how='left', 
                           suffixes=('_mean','_std'))

df = df.loc[df.groupby(['alpha', 'beta'])['val_mean'].idxmax()]
print (df)
   alpha  beta  gamma  val_mean  val_std
2      1     2      9       400      100
4      3     5      8       600      900
3      4     6      7       500      700

For multiple columns one possible idea:

print (mean_df)

   alpha  beta  gamma  val  val1
0      1     2      3  100     5
1      4     6      8  200     9
2      1     2      9  400     3
3      4     6      7  500     5
4      3     5      8  600     2

print (std_df)
   alpha  beta  gamma  val  val1
0      1     2      3  300     5
1      4     6      8  500     7
2      1     2      9  100     9
3      4     6      7  700     2
4      3     5      8  900     1


df = mean_df.merge(std_df, on=['alpha', 'beta','gamma'], 
                           how='left', 
                           suffixes=('_mean','_std'))

cols = ['val','val1']

dfs = [df.loc[df.groupby(['alpha', 'beta'])[f'{col}_mean'].idxmax(), 
              [f'{col}_mean', f'{col}_std']].reset_index(drop=True) for col in cols]

df0 = df[['alpha', 'beta','gamma']].drop_duplicates(['alpha','beta']).reset_index(drop=True)
df = pd.concat([df0] + dfs, axis=1)
print (df)
   alpha  beta  gamma  val_mean  val_std  val1_mean  val1_std
0      1     2      3       400      100          5         5
1      4     6      8       600      900          2         1
2      3     5      8       500      700          9         7