Compare data in df1 and df2 columns based on third df3 and get data for matched row data from df2 last column

Question

I have 3 dataframes in df1

srno	col1	col2	col3
1	a1	a2	a3
2	b1	c2	c3
3	d1	b2
4	e1	e2	e3

df2

srno	col1	col2	col3	col4
1	a1			g1
2		b2		g2
3		c2	c3	g3

df3

priority	col_combination
1	col1
2	col2,col3

I am looking for below output df1

srno	col1	col2	col3	col4
1	a1	a2	a3	g1
2	b1	c2	c3	g3
3	d1	b2		g2
4	e1	e2	e3

I have tried multiple ways but not able to achieve this, I am new to Python coding, any way to achieve this? Below code I tried it does match and return found/not found but could not yet able to assign df1[col4] = df2[col4] for matching rows.

for i in df3.index: 
    if "," in df3.loc[i,"col_combination"]:        
        print("multi column values to handle later")
    else:
        df1['col4'] = np.where(df1[df3.loc[i, "col_combination"]].isin(df2[df3.loc[i, "col_combination"]]),'found','not found')

Do you want 'g3' or 'g2' in 4th column, 2nd row, in your expected output? — LoukasPap
Thanks Papadopoulos for looking into this, I want to update df1[col4] with matching values from df2[col4] by matching df1 columns with df2 based on priority sequence in df3 and column combination in df3[col_combination] — Mukul Ranjan
Hi Mulak, welcome to SO! Could you please provide an example of the code that you have tried so far? — Geza Kerecsenyi
Thanks @GezaKerecsenyi for looking into this, as I mentioned I tried multiple ways but could not achieve success, below code worked for me to do upto match but not yet able to assign values to df1[col4]. Updated code in post — Mukul Ranjan

Ferris Ferris · Accepted Answer · 2021-01-11T11:02:00

del df1['col4']

# read every condition in df3
# and merge related df2 columns to df1
# with the condition column

for i, row in df3.iterrows():
    priority = row['priority']
    col_list = row['col_combination'].split(',')
    df1 = pd.merge(df1, df2[col_list + ['col4']], on=col_list, how='left')
    # rename every merge df1's new col4 to priority no.
    df1.rename(columns={'col4':priority}, inplace=True)
    

print(df1)

#    srno col1 col2 col3    1    2
# 0     1   a1   a2   a3   g1  NaN
# 1     2   b1   c2   c3  NaN   g3
# 2     3   d1   b2  NaN  NaN   g2
# 3     4   e1   e2   e3  NaN  NaN

priority_list = df3['priority'].tolist()
obj = df1[priority_list[0]]
# use combine_first to merge priority(columns) 1, 2
for priority in priority_list[1:]:
    obj = obj.combine_first(df1[priority])

# re-assign
df1['col4'] = obj

print(df1)

#    srno col1 col2 col3    1    2 col4
# 0     1   a1   a2   a3   g1  NaN   g1
# 1     2   b1   c2   c3  NaN   g3   g3
# 2     3   d1   b2  NaN  NaN   g2   g2
# 3     4   e1   e2   e3  NaN  NaN  NaN


# finally del priority columns
df1.drop(priority_list, axis=1, inplace=True)

Compare data in df1 and df2 columns based on third df3 and get data for matched row data from df2 last column

1 Answers