Objective
I want to create Databricks global unmanaged tables from ADLS data and use them from multiple clusters (automated and interactive). So I'm doing CREATE TABLE my_table ...
first, then MSCK REPAIR TABLE my_table
. I'm using Databricks internal Hive metastore.
The issue
Sometimes MSCK REPAIR
wasn't synced across clusters (at all, for hours). Means cluster #1 saw partitions immediately, while cluster #2 didn't see any data for some time.
Sometimes it's synced, still I can't understand why it doesn't work in other cases.
Question
Does Databricks use separate internal hive metastore per cluster? If yes, are there any guarantees about sync-up between clusters?