Objective
I little bit confused by terminology: I've built Data Lake (not DW) based on Kimball's data modeling approaches and now not sure if I can use Data Mart definition to name my MPP database layer.
I came from the assumption that you still need Dimensional Modeling and Star Schema for mid+ size organization reports, same reasoning as in this article.
Questions
- Is it right to call Synapse a Data Mart at the following architecture (see picture below)?
- Can I say that I don't have DW (even if I have Star Schema), but instead I have Data Lake + Data Mart(s)?
- Shall I split Synapse into multiple schemas based on business/reports sub-domains (multiple Data Marts)?
Architecture details
To be more specific, in my case:
2-3) ADLS + Databricks form Data Lake. All ETL and Star Schema build happens at Data Lake layer. All logic seats here. Still it has structured and unstructured data at raw layer, use cheap ADLS storage, lack Governance, has ML and will have streaming in the future. In other hand, we have schema-on-write in all DL zones except raw, we have tables modeled upfront (with a lot of requirements changes during the process). Am I right to call it Data Lake?
4.) Synapse serves as a tiny projection/model of ETL/Lake results in order to speed up reports response time. Almost zero logic here, few aggregations. Only final model is loaded to Synapse. Data are not splitted by business sub-domains, we just load everythin in a single DATAMART schema. Is that a good approach?