I am reading parquet files altogether from s3 bucket in pyspark . There are some parquet files those having different schema and this is causing job error . I want to pass pre-defined schema and spark job should read only files matching with pre-defined scehma .
data = spark.read.parquet(*path_list)
above parquet spark read command is reading files in bulk . How is it possible to read only parquet files passing pre-defined schema and only those parquet files should be read matching with schema passed. Restriction is I need to achieve this with bulk load and that means passing list of file ( path_list ) to spark read parquet command.