1
votes

Difference between Spark-SQL and Hive on Spark. I am going through the documentation of spark and sql and trying to understand the difference between Spark-SQL and HIVE on Spark.

  1. Consider a case when I initiate a spark session without any obvious hive support like copying hive-site.xml and then persist a table in my spark program, where will the data and metadata be stored. Will spark create a new Hive Metastore (like derby)?
  2. Consider a case when I initiate a spark session with hive support like copying hive-ste.xml and making spark aware of existing hive. Then if I persist the table will data and metadata be stored in my existing Hive Metastore and Data in Warehouse directory of HDFS.
  3. If I run HIVE by changing the execution engine property to Spark then is it same as above mentioned case 2 ?

Thanks.

1
If you initialize Spark without hive support then it won't use metastore at all. Hive is not crucial for Spark which has its own standalone catalog. Regarding 2 and there are not really comparable.zero323

1 Answers

0
votes
  1. When you initiate a spark session, the data can be stored in S3 or HDFS.It will not inherently create a Hive session without you explicitly creating so.

  2. Yes if you use the 'saveastable' clause referencing a Hive table. the data will be persisted within the HDFS. Bear in mind if you drop the HDFS instance such as in EMR the table will be dropped along with its data.

Not sure about question # 3