In my job final step is to store the executed data in Hive table with partition on "date" column. Sometime, due to job fail, I need to re-run job for particular partition alone. As observed, when I use below code, spark overrides all the partitions when using overwrite mode.
ds.write.partitionBy("date").mode("overwrite").saveAsTable("test.someTable")
After going through multiple blogs and stackoverflow, I followed below steps to overwrite particular partitions only.
Step 1: Enbable dynamic partition for overwrite mode
spark.conf.set("spark.sql.sources.partitionOverWriteMode", "dynamic")
Step 2: write dataframe to hive table using saveToTable
Seq(("Company1", "A"),
("Company2","B"))
.toDF("company", "id")
.write
.mode(SaveMode.Overwrite)
.partitionBy("id")
.saveAsTable(targetTable)
spark.sql(s"SELECT * FROM ${targetTable}").show(false)
spark.sql(s"show partitions ${targetTable}").show(false)
Seq(("CompanyA3", "A"))
.toDF("company", "id")
.write
.mode(SaveMode.Overwrite)
.insertInto(targetTable)
spark.sql(s"SELECT * FROM ${targetTable}").show(false)
spark.sql(s"show partitions ${targetTable}").show(false)
Still it overwrite all the partitions.

As per this blog, https://www.waitingforcode.com/apache-spark-sql/apache-spark-sql-hive-insertinto-command/read, "insertinto" should overwrite only particular partitions
If I create table first and then use "insertinto" method, it working fine
Set required configuration,
Step 1: Create table
Step 2: Add data using insertinto method

I wanted to know, what is difference between creating hive table via SaveToTable and creating table manually? Why it is not working in first scenario? Could any one help me in this?

