I have a dataframe (df1) like this.
import pandas as pd
import numpy as np
d1 = {'A': [np.nan, 'India', 'CHN', 'JP'],
'B': [np.nan, np.nan, np.nan, np.nan]}
df1 = pd.DataFrame(data=d1)
df1
A B
0 NaN NaN
1 India NaN
2 CHN NaN
3 JP NaN
And another dataframe like this.
d2 = {'X': ['Japan', 'China', 'India'],
'Y': ['JP', 'CN', 'IN'],
'Z': ['JPN', 'CHN', 'IND']}
df2 = pd.DataFrame(data=d2)
df2
X Y Z
0 Japan JP JPN
1 China CN CHN
2 India IN IND
I'm trying to update values in column B of df1, while searching for values in column A of df1 in all of the columns of df2, with values from column X of df2 when there is a match.
The expected result is:
A B
0 NaN NaN
1 India India
2 CHN China
3 JP Japan
I tried using inner join (pd.merge()) but as I have one column in left and 3 columns on right, I couldn't get far.
pd.merge(df1, df2, left_on=["A"], right_on=["X"], how="inner")
I tried using isin() and .loc() but since I need to update df1['B'] with value from df2, I couldn't figure how I could get respective data from df2.
df1.loc[
(df1["A"].isin(df2["X"])) |
(df1["A"].isin(df2["Y"])) |
(df1["A"].isin(df2["Z"]))
]
I have an idea to store each column values as keys in a dictionary and values as their respective value from df2['X']. Using that dictionary as lookup for each row in df1['A'], I could update value of df1['B'].
lookup_data = {
"Japan" : "Japan",
"JP" : "Japan",
"JPN" : "Japan"
}
df1['B'] = [lookup_data.get(x, np.nan) for x in df1['A']]
However, I'm interested if this can be solved in a more efficient way. Please help. Thanks.
df2into "tidy, long" format so that there are only three columns,X, a column calledcolumnwhich refers to the column name of the un-reshapeddf2and has entries likeYandZ, andvaluewhich contains entries likeJP,CHNetc. Then I would joindf1toreshaped_df2, group byXand then reduce back down to the form you want by filtering etc. (This is an R tidyverse approach, but I'm pretty confident it should be possible in pandas too.) - Curt F.