multiple operations on two dataframes using pandas

Question

This is an extension of my previous question enter link description here

I have two dataframes df1 and df2 of different lengths and two columns as key columns. I would like to perform multiple operations on these dataframes as follows:

Replacing only the blanks (NAs) cells in df1 with corresponding values from df2 based on the key columns
for each key columns pair, the cells from both the dataframes where values are contradicting should be reported in a new dataframe

df1

id_col1   id_col2   name    age    sex
---------------------------------------
101         1M              21  
101         3M              21      M
102         1M      Mark    25

df2

id_col1    id_col2    name     age     sex
-------------------------------------------
101          1M       Steve             M
101          2M                         M
101          3M       Steve    25   
102          1M       Ria      25       M
102          2M       Anie     22       F

After performing operation 1, i.e. replacing NA's in df1 with the corresponding values from df2, I should get the following:

result_1

id_col1    id_col2    name     age     sex
-------------------------------------------
101         1M        Steve    21      M
101         3M        Steve    25      M
102         1M        Mark     25      M

After performing operation 2, i.e. conflicting cells in df1 and df2 for the same key columns, I should get the following:

result_2

id_col1    id_col2    name     age     sex
-------------------------------------------
101          3M                21   
101          3M                25   
102          1M        Mark     
102          1M        Ria

Can anyone help in solving these?

wwnde wwnde · Accepted Answer · 2020-05-05T21:10:15

Using df1

df2

Merge

df3=df2.merge(df1, left_index=True,right_index=True,suffixes=('_left', ''), how='left')

Solution 1, use np.where to transfer details and drop rows not required

df3['name']=np.where(df3['name'].isna(),df3['name_left'],df3['name'])
df3['sex']=np.where(df3['sex_left'].isna(),df3['sex'],df3['sex_left'])
df4=df3[df3.index.isin(df1.index)].iloc[:,-3::]

Outcome

You are not so clear on the conflicts and so I assummed conflicts on Name and Age. Fr that reason, I obviusly drop NaNs in age because they exist where I didnt fill them

df3=df3.dropna(subset=['age','age_left'])

Derive dataframe on series of boolean selection

df3[(df3['name_left']!=df3['name'])& df3['age_left']!=df3['age']].dropna(thresh=1, inplace=True)

multiple operations on two dataframes using pandas

df1

df2

result_1

result_2

2 Answers