2
votes

Want to write spark dataframe into existing parquet hive table. I am able to do it usingdf.write.mode("append").insertI to("myexistinghivetable")but if I check through file system I could see spark files are landed with .c000 extension. What those files mean? And how to write dataframe into parquet hive table.

2
I don't know the hive table location that will be decided run time based on the partition values. Again can't we directly write dataframe into hive parquet table without workaround - Rahul

2 Answers

3
votes

We can do it using df.write.partitionBy("mypartitioncols").format("parquet").mode(SaveMode.Append).saveAsTable("hivetable") In earlier version of spark save mode append was not there.

2
votes

You can save dataframe as parquest at location where your hive table is referring after that you can alter tables in hive

You can do like this

df.write.mode("append").parquet("HDFS directory path")