I am working on two data-frames which have different column names and dimensions.
First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df
Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched
Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1
df1
name |
---|
ab |
alex |
bob |
robert |
bill |
df2
0 | 1 | 2 | 3 |
---|---|---|---|
abram | ab | ||
robert | rob | bob | robbie |
alexander | alex | al | |
william | bill |
result_df
name | matched_name |
---|---|
ab | abram |
alex | alexander |
bob | robert |
robert | robert |
bill | william |
The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:
''' result_df = process_name(df1, df2)
def process_name(df1, df2):
for elem in df2.values:
if elem in df1['name']:
df1["matched_name"] = df2[0]
'''