Which data type for date-type attributes in a dimensional table, including start & end dates?

Question

I'm designing a data warehouse using dimensional modeling. I've read most of the Data Warehouse Toolkit by Kimbal & Ross. My question is regarding the columns in a dimensional table that hold dates. For example, here is a table for Users of the application:

CREATE TABLE user_dim (
   user_key BIGINT,  -- surrogate key
   user_id BIGINT,   -- natural key
   user_name VARCHAR(100),
   ...
   user_added_date DATE, -- type 0, date user added to the system
   ...
   -- Type-2 SCD administrative columns
   row_start_date DATE, -- first effective date for this row
   row_end_date DATE,   -- last effective date for this row, 9999-12-31 if current
   row_current_flag VARCHAR(10), -- current or expired
)

The last three attributes are for implementing type 2 slowly-changing dimensions. See Kimbal page 150-151.

Question 1: Is there are best practice for the data type of the row_start_date and row_end_date columns? The type could be DATE (as shown), STRING/VARCHAR/CHAR ("YYYY-MM-DD"), or even BIGINT (foreign key to Date Dimension). I don't think there would be much filtering on the row start/end dates, so a key to the Date Dimension is not required.

Question 2: Is there a best practice for the data type of dimension attributes such "user_added_date"? I can see someone wanting reports on users added per fiscal quarter, so using a foreign key to Date Dimension would be helpful. Any downsides to this, besides having to join from User Dimension to Date Dimension for display of the attribute?

If it matters, I'm using Amazon Redshift.

Jon Scott Jon Scott · Accepted Answer · 2019-05-24T16:20:05

Question 1 : For the SCD from and to dates I suggest you use timestamp. My preference is WITHOUT time zone and ensure all of your timestamps are UTC

Question 2 : I always set up a date dimension table with a logical key of the actual date. that way you can join any date (e.g. the start date of the user) to the date dimension to find the eg "fiscal month" or whatever off the date dimension. But also you can see the date without joining to the date dimension as its plain to see (stored as a date)

With redshift (or any columnar MPP DBMS) it is good practice to denormalise a little. e.g. use star schema rather than snowflake schema. This is because of the efficiencies that columnar brings, and deals with the inneficient joins (because there are no indexes)

Which data type for date-type attributes in a dimensional table, including start & end dates?

2 Answers