In my case, I had a weird error, whereby even though the indices, column-names
and values were same, the DataFrames
didnt match. I tracked it down to the
data-types, and it seems pandas
can sometimes use different datatypes,
resulting in such problems
For example:
param2 = pd.DataFrame({'a': [1]})
param1 = pd.DataFrame({'a': [1], 'b': [2], 'c': [2], 'step': ['alpha']})
if you check param1.dtypes
and param2.dtypes
, you will find that 'a' is of
type object
for param1
and is of type int64
for param2
. Now, if you do
some manipulation using a combination of param1
and param2
, other
parameters of the dataframe will deviate from the default ones.
So after the final dataframe is generated, even though the actual values that
are printed out are same, final_df1.equals(final_df2)
, may turn out to be
not-equal, because those samll parameters like Axis 1
, ObjectBlock
,
IntBlock
maynot be the same.
A easy way to get around this and compare the values is to use
final_df1==final_df2
.
However, this will do a element by element comparison, so it wont work if you
are using it to assert a statement for example in pytest
.
TL;DR
What works well is
all(final_df1 == final_df2)
.
This does a element by element comparison, while neglecting the parameters not
important for comparison.
TL;DR2
If your values and indices are same, but final_df1.equals(final_df2)
is showing False
, you can use final_df1._data
and final_df2._data
to check the rest of the elements of the dataframes.