0
votes

I'm trying to write the contents of a dataframe to an existing partitioned managed Hive table like so:

outputDF.write.mode("Overwrite").insertInto(targetTable)

The target table is ORC and I'm looking to preserve that. Using saveAsTable would drop and recreate the table as parquet (see here: What are the differences between saveAsTable and insertInto in different SaveMode(s)?).

Trouble is for some of my tables I need the whole table data overwritten (similar to a truncate), not just specific partitions.

It is unclear to me whether overwrite is supported in this context and, if yes, what am I doing wrong. SparkSession sets the following configuration values:

.config("spark.sql.sources.partitionOverwriteMode", "static"/"dynamic")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")

Am I missing something?

Also, I suspect this can be achieved through the SQL API but I am trying to avoid it.

SD_

1

1 Answers

0
votes
// 1. 
outputDF.write.format("parquet").mode("overwrite").saveAsTable(targetTable)
// 2. 
import org.apache.spark.sql.{SaveMode}
outputDF
 .write
 .format("parquet")
 .mode(SaveMode.Overwrite)
 .saveAsTable(targetTable)