I have two data frames df1 and df2 with the following columns:
A = ['id', 'a_id', 'c', 'd']
B = ['a_id', 'e', 'f']
For a_id in df1 matching a_id in df2, I need to add e from df2 to df1 row. Df1 is around 7 million rows and df2 is around 15k. I tried the code below but it takes too long. I was wondering if there's a better solution that could speed things up a bit and more memory efficient.
def map_df(row):
for i, r in df2.iterrows():
if row['a_id'] == r['a_id']:
return row2['part_mean_correctness']
df1['e'] = df1.apply (lambda row: map_df(row), axis=1)