0
votes

I have a issue with inserting the Spark dataframe into hive table. Can anyone please help me out. HDP version 3.1, Spark version 2.3 Thanks in advance.

//ORIGNAL CODE PART

import org.apache.spark.SparkContext;
import com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl;
import org.apache.spark.sql.DataFrame
import com.hortonworks.hwc.HiveWarehouseSession;
import org.apache.spark.sql.SparkSession$;

val spark = SparkSession.builder.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
**val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()**
/*
Some Transformation operations happend and the output of transformation is stored in VAL RESULT
/*
val result = {
  num_records
  .union(df.transform(profile(heatmap_cols2type)))
}

result.createOrReplaceTempView("out_temp"); //Create tempview

scala> result.show()
+-----+--------------------+-----------+------------------+------------+-------------------+
| type|              column|      field|             value|       order|               date|
+-----+--------------------+-----------+------------------+------------+-------------------+
|TOTAL|                 all|num_records|               737|           0|2019-12-05 18:10:12|
|  NUM|available_points_...|    present|               737|           0|2019-12-05 18:10:12|

hive.setDatabase("EXAMPLE_DB")
hive.createTable("EXAMPLE_TABLE").ifNotExists().column("`type`", "String").column("`column`", "String").column("`field`", "String").column("`value`","String").column("`order`", "bigint").column("`date`", "TIMESTAMP").create()

hive.executeUpdate("INSERT INTO TABLE EXAMPLE_DB.EXAMPLE_TABLE SELECT * FROM out_temp");

-----ERROR of Orginal code----------------
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:86 Table not found 'out_temp'**strong text**

What I tried as a alternative is: (As Hive and Spark use independent catalogues, by checking the documentation from HWC write operations)

spark.sql("SELECT type, column, field, value, order, date FROM out_temp").write.format("HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR").option("table", "wellington_profile").save()

-------ERROR of Alternative Step---------------- java.lang.ClassNotFoundException: Failed to find data source: HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241) ... 58 elided Caused by: java.lang.ClassNotFoundException: HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR.DefaultSource

My Question is:

Instead of saving the out_temp as a tempview in Spark is there any way to directly create the table in hive ? Is there any way to insert into Hive table from spark dataframe ?

Thank you everyone for your time!

2

2 Answers

0
votes

result.write.save("example_table.parquet")

0
votes
result.write.mode(SaveMode.Overwrite).saveAsTable("EXAMPLE_TABLE")

You can read in more detail from here