I have two pyarrow dataset schemas and for some reason they are different even though they should be the same (i assume that when storing one of the parquet files, for one partition certain column got cast to different data type, but i have no idea which one is it).
Now i know how to compare whether two schemas are the same. I can do that like so:
import pandas as pd
import numpy as np
import pyarrow as pa
df1 = pd.DataFrame({'col1': np.zeros(10), 'col2':np.random.rand(10)})
df2 = pd.DataFrame({'col1':np.ones(10), 'col2': np.zeros(10)})
schema_1 = pa.Schema.from_pandas(df1)
schema_2 = pa.Schema.from_pandas(df2)
schema_1.equals(schema_2)
df3 = df2.copy()
df3['col2'] = df3['col2'].astype('int')
schema_3 = pa.Schema.from_pandas(df3)
print(schema_1.equals(schema_2), schema_1.equals(schema_3))
But how do i find out where are they different? (Visual inspection doesn't count, i briefly tried and haven't seen any difference in over 500 columns)