I have used one way to save dataframe as external table using parquet file format but is there some other way to save dataframes directly as external table in hive like we have saveAsTable for managed table
4 Answers
5
votes
3
votes
2
votes
For external table, don't use saveAsTable. Instead, save the data at location of the external table specified by path. Then add partition so that it is registered with hive metadata. This will allow you to hive query by partition later.
// hc is HiveContext, df is DataFrame.
df.write.mode(SaveMode.Overwrite).parquet(path)
val sql =
s"""
|alter table $targetTable
|add if not exists partition
|(year=$year,month=$month)
|location "$path"
""".stripMargin
hc.sql(sql)
-1
votes
You can also save dataframe with manual create table
dataframe.registerTempTable("temp_table");
hiveSqlContext.sql("create external table
table_name if not exist as select * from temp_table");
Below mentioned link has a good explanation for create table https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html