0
votes

I need to merge two data frames. To make sure that the rows are unique I need to verify that "Name" and "Age" are BOTH matched before merging. I am using the combination as a primary key. Here is my code:

df = pd.merge(df, df1[['Name', 'Age', 'Date']], left_on=['Name', 'Age'], right_on=['Name', 'Age'], how='left')

When I use multiple keys ("Name" and "Age") is this an "And" or "OR" match. I want it only to merge if both "Name" and "Age" match, not if only "Name" matches or "Age" matches. I can't seem to find this in the documentation and I am getting some mixed results.

Update: I do need to do a LEFT join for data reasons. The RIGHT table is a subset of the data I need while I need to keep all data on the LEFT. The key concept is to only merge from the RIGHT the data that matches both "Name" and "Age".

1
Remove, how='left' and you will get an inner join only when Name and Age match.Scott Boston
You need to do an inner join. The left join keeps all the keys of the left DataFrame and only the matching keys of the right DataFrameShivam Roy
Thank you for your comments. I do need to do a LEFT join for data reasons. The right table is a subset of the data I need while I need to keep all data on the LEFT. The key is to only merge from the right the data that matches both "Name" and "Age".mjj2u2

1 Answers

0
votes

I did some testing on this. The answer is Pandas merging on multiple keys is an "AND" function, not "OR".

Here is the test I did:

DF1 =:

A   B   D
----------
a   m   1
b   n   2
c   o   3
d   q   4
e   r   5
f   s   6
g   t   7
h   u   8
i   v   9

DF2 =:

Q   R   S
-----------
a   m   abc
x   n   def
c   x   hij
d   q   klm
x   r   nop
f   x   qrs
g   t   tuv
x   u   wxy
i   x   zab

(Replacing letters with "x" to make them not match.)

Running this merge:

df_merged = pd.merge(df1, df2[['Q', 'R', 'S']], left_on=['A', 'B'], right_on=['Q', 'R'], how='left')

Returns this output:

A  B  D    Q    R    S
--------------------------
a  m  1    a    m  abc
b  n  2  NaN  NaN  NaN
c  o  3  NaN  NaN  NaN
d  q  4    d    q  klm
e  r  5  NaN  NaN  NaN
f  s  6  NaN  NaN  NaN
g  t  7    g    t  tuv
h  u  8  NaN  NaN  NaN
i  v  9  NaN  NaN  NaN

Information is only included from the RIGHT when BOTH keys match. Specifically when A==Q AND B==R.

Hope this can help others.