4
votes

I was aware of the fact that Hive Metastore is used to store metadata of the tables that we create in HIVE but why do spark required Metastore, what is the default relation between Metastore and Spark

Does metasore is being used by spark SQL, if so is this to store dataframes metadata?

Why does spark by defaults checks for metastore connectivity even though iam not using any sql libraries?

1

1 Answers

1
votes

Here is explanation from spark-2.2.0 documentation

When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.