0
votes

I have two data frames:

df1 = pd.DataFrame({'A': ['A1', 'A2', 'A3','A4','A5']})

df2 = pd.DataFrame({'AA': ['A1', 'A3','A16','A1'],
                      'BB': ['A2', 'Z1','A12','X9'],
                      'CC': ['121', '345','444','432'],
                      'DD': ['D0', 'D1','783','980']})

df1:

    A
0   A1
1   A2
2   A3
3   A4
4   A5

df2:

    AA  BB  CC  DD
0   A1  A2  121 D0
1   A3  Z1  345 D1
2   A16 A12 444 783
3   A1  X9  432 980

I want to merge these two data frames based on the condition that the resulting data frame consists of those rows of DF2 whose values in either column AA or column BB matches with values in column A in DF1.

For example:

    AA  BB  CC  DD
0   A1  A2  121 D0
1   A1  A2  121 D0
2   A1  X9  432 980
1   A3  Z1  345 D1

I tried it the following way, but I am not sure if it is the right approach. First I merged on column A(df1) and AA(df2):

half1 = pd.merge(df1, df2,  how='left', left_on=['A'], right_on = ['AA'])

Then I merged on column A(df1) and BB(df2):

half2 = pd.merge(df1, df2,  how='left', left_on=['A'], right_on = ['BB'])

and then took union:

union = half1.merge(half2, how='outer')

but the result is not what I want.

1

1 Answers

0
votes

For your data working test membership for both columns by Series.isin with filter by boolean indexing and then concat:

df3 = pd.concat([df2[df2['AA'].isin(df1['A'])],
                 df2[df2['BB'].isin(df1['A'])]]).sort_index()
print (df3)
   AA  BB   CC   DD
0  A1  A2  121   D0
0  A1  A2  121   D0
1  A3  Z1  345   D1
3  A1  X9  432  980

Your solution should be changed with inner join with rename columns and also pd.concat:

half1 = pd.merge(df1.rename(columns={'A':'AA'}), df2, on='AA')
half2 = pd.merge(df1.rename(columns={'A':'BB'}), df2, on='BB')

df3 = pd.concat([half1,half2]).sort_index()
print (df3)
   AA  BB   CC   DD
0  A1  A2  121   D0
0  A1  A2  121   D0
1  A1  X9  432  980
2  A3  Z1  345   D1