2
votes

I have two dataframes as follows:

df1

Name    Id   c1  c2  c3  c4
---------------------------
asd     101  a   b   c   d
cdf     231  e   ?   1  
zxs     342  f   o      
ygg     521  g   k   p  
mlk     432  h   m       z
abc     343  c   x   q  
xyz     254  1   d   2  
fgg     165  c   z   d   mm
mnd     766  2   d   v  

df2

df2_Name    df2_Id  df2_c2  df2_c4
----------------------------------
asd          101      h      d2
ygg          521      x      cd
fgg          165      o      cm

I want to match the "Name" and "id" from df1 with "df2_Name" and "df2_id" of df2. Wherever a match is found, the values of "c2" and "c4" in df1 are replaced by the values in "df2_c2" and "df2_c4" from df2.

Desired output

Name    Id    c1    c2  c3  c4
-------------------------------
asd     101    a    h   c   d2
cdf     231    e    ?   1   
zxs     342    f    o       
ygg     521    g    x   p   cd
mlk     432    h    m       z
abc     343    c    x   q   
xyz     254    1    d   2   
fgg     165    c    o   d   cm
mnd     766    2    d   v   

Tried solution 1

df1[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index)].iloc[:,[3,5]].update(df2.iloc[:,[2,3]]) 

Result: Original df1 is returned as it is.

Tried solution 2

df1.loc[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index), ['c2', 'c4']] = df2[['df2_c2', 'df2_c4']]

Result: NaNs introduced

Name    id   c1 c2  c3  c4
----------------------------
asd     101  a  NaN c   NaN
cdf     231  e  ?   1   
zxs     342  f  o       
ygg     521  g  NaN p   NaN
mlk     432  h  m       z
abc     343  c  x   q   
xyz     254  1  d   2   
fgg     165  c  NaN d   NaN
mnd     766  2  d   v   

Tried solution 3 (for c2 only)

merged = df1.merge(df2, left_on=["id", "Name"], right_on=["df2_id", "df2_Name"])

merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)

Result:

Name    id    c1 c2 c3  c4  df2_Name    df2_id  df2_c2  df2_c4
--------------------------------------------------------------
asd     101   a   h c   d   asd         101      h       d2
ygg     521   g   x p       ygg         521      x       cd
fgg     165   c   o d   mm  fgg         165      o       cm

This solution 3 replaces the values of selected columns, however it returns the merged dataframe and not the entire df1 with updates.

Can anyone help me with this problem?

Note:

This question is being asked after trying the solutions in the following, however there is no success:

  1. update-a-pandas-dataframe-with-data-from-another-dataframe
  2. replace-column-values-based-on-another-dataframe-python-pandas-better-way
1
can you include your desired output to corroborate the correctness of our answers?, also provide sample data, not imagesYuca
added desired output now.Hanif
Welcome to SO. Please review How to Ask and create a minimal reproducible example. If you had read the material recommended to you before you posted, you would notice that this explicitly states Do not include images of code. That includes your sample Dataframes.user3483203
Yuca, this is not a simple merge operation. Its a partial replace operation where the tricky part is not to merge on columns but to replace some of the columns values with new values.Hanif

1 Answers

2
votes

I would use merge to join the two dataframes. Then you got colums with your old values and a column with new values and nan values. Afterwards use apply to join these columns:

merged = df1.merge(df2, how='outer', left_on=["id", "name"], right_on=["df2_id", "df2_name"])
merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)
# Same for c4
# Drop df2_c2 and df2_c4

I currently cant test it so let me know if this works for you.