Compare two data-frames with different column names and update first data-frame with the column from second data-frame

Question

I am working on two data-frames which have different column names and dimensions.

First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df

Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched

Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1

df1

name
ab
alex
bob
robert
bill

df2

0	1	2	3
abram	ab
robert	rob	bob	robbie
alexander	alex	al
william	bill

result_df

name	matched_name
ab	abram
alex	alexander
bob	robert
robert	robert
bill	william

The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:

''' result_df = process_name(df1, df2)

def process_name(df1, df2):

for elem in df2.values:
    
    if elem in df1['name']:
        df1["matched_name"] = df2[0]

'''

Anurag Dabas Anurag Dabas · Accepted Answer · 2021-05-26T02:47:14

Try via concat(),merge(),drop() and rename() and reset_index() method:

df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
    .drop(['1','2','3'],1)
    .rename(columns={'0':'matched_name'})
    .reset_index(drop=True))

Output of df:

    name    matched_name
0   robert  robert
1   ab      abram
2   alex    alexander
3   bill    william
4   bob     robert

Compare two data-frames with different column names and update first data-frame with the column from second data-frame

1 Answers