0
votes

I am working on two data-frames which have different column names and dimensions.

First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df

Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched

Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1

df1

name
ab
alex
bob
robert
bill

df2

0 1 2 3
abram ab
robert rob bob robbie
alexander alex al
william bill

result_df

name matched_name
ab abram
alex alexander
bob robert
robert robert
bill william

The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:

''' result_df = process_name(df1, df2)

def process_name(df1, df2):

for elem in df2.values:
    
    if elem in df1['name']:
        df1["matched_name"] = df2[0]

'''

1

1 Answers

1
votes

Try via concat(),merge(),drop() and rename() and reset_index() method:

df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
    .drop(['1','2','3'],1)
    .rename(columns={'0':'matched_name'})
    .reset_index(drop=True))

Output of df:

    name    matched_name
0   robert  robert
1   ab      abram
2   alex    alexander
3   bill    william
4   bob     robert