0
votes

I am not sure from documentation if when creating an Hive table using HiveContext from Spark, will it use the Spark engine or the standard Hive mapreduce job to perform the task?

val sc = new SparkContext()
val hc = new HiveContext(sc)

hc.sql("""
    CREATE TABLE db.new_table
    STORED AS PARQUET
    AS SELECT
        field1,
        field2,
        field3
    FROM db.src1 
        JOIN db.src2
        ON (x = y)
"""
)
2

2 Answers

1
votes
Spark 1.6

Spark SQL supports Apache Hive using HiveContext. It uses the Spark SQL execution engine to work with data stored in Hive.

above Spark 2.x

val spark = SparkSession .builder() .appName( "SparkSessionExample" ) .config( "spark.sql.warehouse.dir" , warehouseLocation) .enableHiveSupport() .getOrCreate()

0
votes

When doing this now, SPARK will use SPARK APIs and not MR. Hivecontext need not be explicitly referenced as is deprecated, even in spark-submit / program mode.