I'm having an issue writing a Hive table from Spark. The following code works just fine; I can write the table (which defaults to the Parquet format) and read it back in Hive:
df.write.mode('overwrite').saveAsTable("db.table")
hive> describe table;
OK
val string
Time taken: 0.021 seconds, Fetched: 1 row(s)
However, if I specify the format should be csv:
df.write.mode('overwrite').format('csv').saveAsTable("db.table")
then I can save the table, but Hive doesn't recognize the schema:
hive> describe table;
OK
col array<string> from deserializer
Time taken: 0.02 seconds, Fetched: 1 row(s)
It's also worth noting that I can create a Hive table manually and then insertInto it:
spark.sql("create table db.table(val string)")
df.select('val').write.mode("overwrite").insertInto("db.table")
Doing so, Hive seems to recognize the schema. But that's clunky and I can't figure a way to automate the schema string anyway.
saveAsTablewith the default format, you don't have to do that. I hadn't run into the binary encoding issues with Parquet. Thanks for the heads up! - santon