1
votes

I have a dataframe (df1) like this.

import pandas as pd
import numpy as np

d1 = {'A': [np.nan, 'India', 'CHN', 'JP'], 
      'B': [np.nan, np.nan, np.nan, np.nan]}

df1 = pd.DataFrame(data=d1)
df1
    A      B
0   NaN    NaN
1   India  NaN
2   CHN    NaN
3   JP     NaN

And another dataframe like this.

d2 = {'X': ['Japan', 'China', 'India'], 
      'Y': ['JP', 'CN', 'IN'], 
      'Z': ['JPN', 'CHN', 'IND']}

df2 = pd.DataFrame(data=d2)
df2
    X       Y   Z
0   Japan   JP  JPN
1   China   CN  CHN
2   India   IN  IND

I'm trying to update values in column B of df1, while searching for values in column A of df1 in all of the columns of df2, with values from column X of df2 when there is a match.

The expected result is:

    A       B
0   NaN     NaN
1   India   India
2   CHN     China
3   JP      Japan

I tried using inner join (pd.merge()) but as I have one column in left and 3 columns on right, I couldn't get far.

pd.merge(df1, df2, left_on=["A"], right_on=["X"], how="inner")

I tried using isin() and .loc() but since I need to update df1['B'] with value from df2, I couldn't figure how I could get respective data from df2.

df1.loc[
  (df1["A"].isin(df2["X"])) | 
  (df1["A"].isin(df2["Y"])) | 
  (df1["A"].isin(df2["Z"]))
]

I have an idea to store each column values as keys in a dictionary and values as their respective value from df2['X']. Using that dictionary as lookup for each row in df1['A'], I could update value of df1['B'].

lookup_data = {
  "Japan" : "Japan",
  "JP" : "Japan",
  "JPN" : "Japan"
}
df1['B'] = [lookup_data.get(x, np.nan) for x in df1['A']]

However, I'm interested if this can be solved in a more efficient way. Please help. Thanks.

2
What I would do is reshape df2 into "tidy, long" format so that there are only three columns, X, a column called column which refers to the column name of the un-reshaped df2 and has entries like Y and Z, and value which contains entries like JP, CHN etc. Then I would join df1 to reshaped_df2, group by X and then reduce back down to the form you want by filtering etc. (This is an R tidyverse approach, but I'm pretty confident it should be possible in pandas too.) - Curt F.

2 Answers

1
votes

This need check the value for all cell in df2 from df1 column A , so we do

s=f1.A.dropna().map(lambda x : df2.loc[df2.isin([x]).any(1).loc[lambda x : x].index,'X'].values[0]) 
df1.B.fillna(s,inplace=True)
df1
       A      B
0    NaN    NaN
1  India  India
2    CHN  China
3     JP  Japan
1
votes

you can use map on the column A from df1 with a series with index being all the values of df2 and values the corresponding value in df2 column X. to do so you can use set_index the column X, stack and then inverse the values and the index in a new series.

#create the series for the map
s = df2.set_index(df2['X']).stack()
s = pd.Series(s.index.get_level_values(0), index=s.values)

# map A and fillna
df1['B'] = df1['B'].fillna(df1['A'].map(s))

print (df1)
       A      B
0    NaN    NaN
1  India  India
2    CHN  China
3     JP  Japan