I have a 3-4 clusters in my databricks instance of Azure cloud platform. I want to maintain a common metastore for all the cluster. Let me know if anyone implemented this.
3
votes
Isn't that what Cloudera calls SDX (shared data experience) and provides in their cloud offerings?
– mazaneicha
Hi @pankajs, If the answer is helpful for you, you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you.
– CHEEKATLAPRADEEP-MSFT
1 Answers
2
votes
I recommend configuring an external Hive metastore. By default, Detabricks spins its own metastore behind the scenes. But you can create your own database (Azure SQL does work, also MySQL or Postgres) and specify it during the cluster startup.
Here are detailed steps: https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore
Things to be aware of:
- Data tab in Databricks - you can choose the cluster and see different metastores.
- To avoid using SQL user&password, look at Managed Identities https://docs.microsoft.com/en-us/azure/stream-analytics/sql-database-output-managed-identity
- Automate external Hive metastore connections by using initialization scripts for your cluster
- Permissions management on your sources. In case of ADLS Gen 2, consider using password pass-through