4
votes

I know the question is a little bit strange. I love Hadoop & HDFS, but recently work on SparkSQL with Hive Metastore.

I want to use SparkSQL as a vertical SQL engine to run OLAP query across different datasources like RDB, Mongo, Elastic ... without ETL process. Then I register different schema as external tables in Metastore with corresponding Hive storage Handler.

Moreover, HDFS is not used as a datasource in my work. Then, given Map/R is already replaced by Spark engine. That sound to me that Hadoop/HDFS is useless but to base the installation of Hive. I don't want to buy them all.

I wonder If I only start Hive metastore service without Hadoop/HDFS to support SparkSQL, what kind of issue will happen. Would I put myself into the jungle?

1

1 Answers

3
votes

What you need is "Hive Local Mode" (search for "Hive, Map-Reduce and Local-Mode" in the page).

Also this may help.

This configuration is only suggested if you are experimenting locally. But in this case you only need the metastore.

Also from here;

Spark SQL uses Hive Metastore, even if when we don't configure it to . When not configured it uses a default Derby DB as metastore.

So this seems to be quite legal;

  1. Arrange your metastore in Hive
  2. Start Hive in local mode
  3. And make Spark use Hive metastore
  4. Use Spark as an SQL engine for all datasources supported by Hive.