4
votes

I am standing up a new enterprise data warehouse for my company, using Kimball methodology (i.e., a collection of data marts). I'd like to know the best practices (or usual practices) for organizing my data marts.

1) Should each data mart be a separate database on the EDW server? Or, should each data mart be a schema of a single database?

2) For conformed dimensions (i.e., dimensions that apply to 2+ data marts / subject areas / business processes), should they live in a separate schema or database? Or, because we won't know in advance what dimensions will be conformed (since we are building a data mart at a time), should we simply identify the conformed dimensions in our enterprise bus matrix (Excel file) and make no effort to segregate them in the EDW?

3)

a) Should fact tables and dimension tables be identified at all in the EDW? For example, since I will be maintaining a diagram of each star schema that will be shared with self-service BI users, is there any value in identifying fact tables in the DB via some method, say prefixing the table name with 'Fact'?

b) If fact and dimension tables should be identified in the EDW, what should be the identification mechanism? Should it be via table name prefixing? Should it be via organizing the tables into separate 'Fact' and 'Dimension' schemas?

1

1 Answers

3
votes

1) Should each data mart be a separate database on the EDW server? Or, should each data mart be a schema of a single database?

This (also) depends on what database software you are using and whether it imposes any kind of limitation on, for example, using data across multiple schemas.

In any case, you'll inevitably need to connect to, and query/join data from, different data marts, to address some business cases or even ETL processes. You may also need to segregate/secure access to specific data marts, load each data mart independently or using different schedules/methods, etc.

For these reasons, it is usually good enough to keep the data warehouse in one database organized into schemas: one schema per data mart plus specific schemas for shared objects (like conformed dimensions). This way you can still use data that is scattered across multiple data marts, easily control access to specific schemas / data marts, and it'll be easier for users to locate specific metrics/facts.

2) For conformed dimensions (i.e., dimensions that apply to 2+ data marts / subject areas / business processes), should they live in a separate schema or database? Or, because we won't know in advance what dimensions will be conformed (since we are building a data mart at a time), should we simply identify the conformed dimensions in our enterprise bus matrix (Excel file) and make no effort to segregate them in the EDW?

If you organize data marts into schemas, it makes sense to have a specific schema to hold these conformed dimensions and other shared data. This way, different users that may have access only to specific data marts can still use the conformed/shared dimensions.

3)

a) Should fact tables and dimension tables be identified at all in the EDW? For example, since I will be maintaining a diagram of each star schema that will be shared with self-service BI users, is there any value in identifying fact tables in the DB via some method, say prefixing the table name with 'Fact'?

Yes, using prefixes makes it easier to locate metrics (facts) and dimensions when users are browsing the data warehouse, something like F_tableName or D_tableName would already go a long way.

b) If fact and dimension tables should be identified in the EDW, what should be the identification mechanism? Should it be via table name prefixing? Should it be via organizing the tables into separate 'Fact' and 'Dimension' schemas?

Same as above :)