My question is rather simple, but somehow I cannot find a clear answer by reading the documentation.
I have Spark2 running on a CDH 5.10 cluster. There is also Hive and a metastore.
I create a session in my Spark program as follows:
SparkSession spark = SparkSession.builder().appName("MyApp").enableHiveSupport().getOrCreate()
Suppose I have the following HiveQL query:
spark.sql("SELECT someColumn FROM someTable")
I would like to know whether:
- under the hood this query is translated into Hive MapReduce primitives, or
- the support for HiveQL is only at a syntactical level and Spark SQL will be used under the hood.
I am doing some performance evaluation and I don't know whether I should claim the time performance of queries executed with spark.sql([hiveQL query]) refer to Spark or Hive.