1
votes

I have two data frames:

df1

A1    B1
1     a
2     s
3     d

and

df2

A1    B1
1     a
2     x
3     d

I want to compare df1 and df2 on column B1. The column A1 can be used to join. I want to know:

  1. Which rows are different in df1 and df2 with respect to column B1?
  2. If there is a mismatch in the values of column A1. For example whether df2 is missing some values that are there in df1 and vice versa. And if so, which ones?

I tried using merge and join but that is not what I am looking for.

1
1. df1['B1'] == df2['B1'] 2. can you explain and post desired output as it'd unclear to me what you mean - EdChum

1 Answers

7
votes

I've edited the raw data to illustrate the case of A1 keys in one dataframe but not the other.

When doing your merge, you want to specify an 'outer' merge so that you can see these items with an A1 key in one dataframe but not the other.

I've included the suffixes '_1' and '_2' to indicate the dataframe source (_1 = df1 and _2 = df2) of column B1.

df1 = pd.DataFrame({'A1': [1, 2, 3, 4], 'B1': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'A1': [1, 2, 3, 5], 'B1': ['a', 'd', 'c', 'e']})

df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'])
df3['check'] = df3.B1_1 == df3.B1_2

>>> df3
   A1 B1_1 B1_2  check
0   1    a    a   True
1   2    b    d  False
2   3    c    c   True
3   4    d  NaN  False
4   5  NaN    e  False

To check for missing A1 keys in df1 and df2:

# A1 value missing in `df1`
>>> d3[df3.B1_1.isnull()]
   A1 B1_1 B1_2  check
4   5  NaN    e  False

# A1 value missing in `df2`
>>> df3[df3.B1_2.isnull()]
   A1 B1_1 B1_2  check
3   4    d  NaN  False

EDIT Thanks to @EdChum (the source of all Pandas knowledge...).

df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'], indicator=True)
df3['check'] = df3.B1_1 == df3.B1_2

>>> df3
   A1 B1_1 B1_2      _merge  check
0   1    a    a        both   True
1   2    b    d        both  False
2   3    c    c        both   True
3   4    d  NaN   left_only  False
4   5  NaN    e  right_only  False