Using pyarrow. I have a Parquet Dataset composed of multiple parquet files. If the columns differ between the files then i get a "ValueError: Schema in was different".
Is there a way to avoid this? Meaning i'd like to have a Dataset composed of files which each contain different columns.
I guess this could be done by pyarrow by filling in the values of the missing columns as na if the columns are not there in a particular component file of the Dataset.
Thanks