1
votes

Context :

Let's suppose we have multiple datamarts (Ex : HR, Accounting, Marketing ...) and all of them use the Star Schema as dimensional modeling (Kimball approach ) .

Question :

Since Snowflake cloud data warehouse architecture eliminate the need to spin off separate physical data marts / databases in order to maintain performance. So, what's the best approach to build the multiple datamarts on Snowflake ?

Create database for each datamart ? create one database (EDW )with multiple schema and each schema refer to a datamart ?

Thank you !

3

3 Answers

2
votes

Ron is correct - the answer depends on a few things:

  1. If there are conformed dimensions, then one database and schema might be the way to go
  2. If they are completely non-integrated data marts I would go with separate schemas or even separate databases. They are all logical containers in Snowflake (rather than physical) with full role based access control available to segregate users.

So really - how do you do it today? Does that work for you or are there things you need or want to do that you cannot do today with your current physical setup. How is security set up with your BI tools? Do they reference a database name or just a schema name? If you can, minimize changes to your data pipeline and reporting so you have fewer things that might need refactoring (at least for your first POC or migration).

One thing to note is that with Snowflake you have the ability to easily do cross database joins (i.e., database.schema.table) - all you need is SELECT access, so even if you separate the marts by database oyu can still do cross mart reporting if needed.

Hope that helps.

1
votes

There is no specific need to separate star schemas at all.

If you're using shared / conformed dimensions across your marts, separation would actually be an anti-pattern.

If your issue is simplifying the segregation of users, schema per mart works well.

All of the approaches you've suggested (DB/mart, DW/schema,...) will work, I'm just not clear on the need.

1
votes

The goal of having separate data marts is more related to governance, to keep data organized and where it is expected to be found (i.e. sales transactions in the "sales data mart"), and less related to performance issues.

The advantage of having a single database acting as a data warehouse is that all your data for analytics will be stored in one place, making it more accessible and easier to find. In this case, you can use schemas to implement (logically) separate data marts. You can also use schemas within a database to keep development data separate from production data, for each data mart.

Snowflake is different from traditional relational databases; given its technical architecture, it has no issues with joining large tables between different databases/schemas so you can certainly build different data marts in separate databases and join their facts or dimensions with some other Snowflake database/data mart.

In you specific case, if you have a large number of data marts (e.g. 10 or more) and you're not using Snowflake for much more than data warehouseing, I think the best path would be to implement each data mart in its own database and use schemas to manage prod/dev data within each schema. This will help keep data organized, as opposed to quickly reaching a point where you'll have hundreds of tables (every data mart, and its dev/prod versions) in one database, which won't be a great development or maintenance experience.

But, from a performance perspective, there's really no noticeable difference.