I have a table which has some missing partions. When I call it on hive it works fine
SELECT *
FROM my_table
but when call it from pyspark (v. 2.3.0) it fails with message Input path does not exist: hdfs://path/to/partition. The spark code I am running is just naive:
spark = ( SparkSession
.builder
.appName("prueba1")
.master("yarn")
.config("spark.sql.hive.verifyPartitionPath", "false")
.enableHiveSupport()
.getOrCreate())
spark.table('some_schema.my_table').show(10)
the config("spark.sql.hive.verifyPartitionPath", "false") has been proposed is
this question but seems to not work fine for me
Is there any way I can configure SparkSession so I can get rid of these. I am afraid that in the future more partitions will miss, so a hardcode solution is not possible