Unable to write dataframe into Hive partitioned parquet table using pyspark

Question

I am trying to write my data frame into Partitioned hive table .Hive table format is parquet .

But i am unable to write the df to Hive table.

I am trying to write my data frame into Partitioned hive table .Hive table format is parquet .

But i am unable to write the df to Hive table.

Spark2.3 and Partitioned hive table

When i tried to load my finaldf into hive partitioned table i getting the below error

finaldf.write.mode("overwrite").format("parquet").partitionBy("mis_dt","country_codfe").saveAsTable("FinalTable")

Error : u'Cannot overwrite table schema.Offertablethat is also being read from;'

When i google for the above error they have sugessted to load df into temp table and load into final hive table . I tried that option and again it failed with different error .

finaldf.createOrReplaceTempView('tmpTable') 
final= spark.read.table('tmpTable') 
final.write.mode("overwrite").insertInto("Finaltable")

Error : Number of partitions created is 7004 which is more than 1000 .

But i do not think that we have that many partitions .

finaldf.write.mode("overwrite").format("parquet").partitionBy("mis_dt","country_codfe").saveAsTable("FinalTable")

2nd option :

finaldf.createOrReplaceTempView('tmpTable')
final= spark.read.table('tmpTable') 
final.write.mode("overwrite").insertInto("Finaltable")

I am looking to write the data into hive parquet file format table using spark 2.3

finaldf.write.mode("overwrite").format("parquet").partitionBy("mis_dt","cntry_cde").saveAsTable("finaltable")

Kishore Kishore · Accepted Answer · 2019-05-27T16:05:05

spark.sql.sources.partitionOverwriteMode is introduced in Spark >= 2.3

sparkConf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")    
sparkConf.set("hive.exec.dynamic.partition", "true")
sparkConf.set("hive.exec.dynamic.partition.mode", "nonstrict")

Use below code -

final.write.mode(SaveMode.Overwrite).insertInto("table")

Note: The table should be created in Hive with partition.

Unable to write dataframe into Hive partitioned parquet table using pyspark

1 Answers