I have two data frames:
df1 = pd.DataFrame({'A': ['A1', 'A2', 'A3','A4','A5']})
df2 = pd.DataFrame({'AA': ['A1', 'A3','A16','A1'],
'BB': ['A2', 'Z1','A12','X9'],
'CC': ['121', '345','444','432'],
'DD': ['D0', 'D1','783','980']})
df1:
A
0 A1
1 A2
2 A3
3 A4
4 A5
df2:
AA BB CC DD
0 A1 A2 121 D0
1 A3 Z1 345 D1
2 A16 A12 444 783
3 A1 X9 432 980
I want to merge these two data frames based on the condition that the resulting data frame consists of those rows of DF2 whose values in either column AA or column BB matches with values in column A in DF1.
For example:
AA BB CC DD
0 A1 A2 121 D0
1 A1 A2 121 D0
2 A1 X9 432 980
1 A3 Z1 345 D1
I tried it the following way, but I am not sure if it is the right approach. First I merged on column A(df1) and AA(df2):
half1 = pd.merge(df1, df2, how='left', left_on=['A'], right_on = ['AA'])
Then I merged on column A(df1) and BB(df2):
half2 = pd.merge(df1, df2, how='left', left_on=['A'], right_on = ['BB'])
and then took union:
union = half1.merge(half2, how='outer')
but the result is not what I want.