3
votes

Have been trying to get an accurate view of how Spark's catalog API stores the metadata.

I have found some resources, but no answer:

I see some tutorials that take for granted the existence of Hive Metastore.

  • Is Hive Metastore potentially included with Spark distribution?
  • Spark cluster can be short-lived, but Hive metastore would obviously need to be long-lived

Apart from the catalog feature, partitioning and sorting features when writing out a DF seem to depend on Hive... So "everyone" seems to take Hive as granted when talking about key Spark features of persisting a DF.

1

1 Answers

0
votes

Spark becomes aware of Hive MetaStore when it is provided with hive-site.xml, which is typically placed under $SPARK_HOME/conf. Whenever enableHiveSupport() method is used while creating SparkSession, Spark finds where and how to get connected with Hive metastore. Spark therefore does not explicitly stores hive settings.