5
votes

I have used one way to save dataframe as external table using parquet file format but is there some other way to save dataframes directly as external table in hive like we have saveAsTable for managed table

4

4 Answers

5
votes

you can do this in this way

df.write.format("ORC").options(Map("path"-> "yourpath")) saveAsTable "anubhav"

3
votes

In PySpark, External Table can be created as below:

df.write.option('path','<External Table Path>').saveAsTable('<Table Name>')
2
votes

For external table, don't use saveAsTable. Instead, save the data at location of the external table specified by path. Then add partition so that it is registered with hive metadata. This will allow you to hive query by partition later.

// hc is HiveContext, df is DataFrame. df.write.mode(SaveMode.Overwrite).parquet(path) val sql = s""" |alter table $targetTable |add if not exists partition |(year=$year,month=$month) |location "$path" """.stripMargin hc.sql(sql)

-1
votes

You can also save dataframe with manual create table

dataframe.registerTempTable("temp_table");
hiveSqlContext.sql("create external table 
   table_name if not exist as select * from temp_table");

Below mentioned link has a good explanation for create table https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html