I have two very large dataframes, say df1 & df2, each containing over 1M rows and multiple columns. Both df1 and df2 contain an ID column. I need to perform the following operation:
- check the first row of df1 and match it with the N-th row of df2 with the same ID.
- Take the value of the N-th row of the 3rd column of df2 and place it in a new column in df1.
- Repeat 1.&2. for all 1 million rows of df1.
Which way is the fastest to perform this operation? So far what I thought has been to transform both dataframes ID columns into index (with pd.set_index()) and then use the .apply() function to loop over all rows. But it is taking quite a long time to perform the operation (and I will need to perform a more complex procedure later on).