I'm designing a data warehouse using dimensional modeling. I've read most of the Data Warehouse Toolkit by Kimbal & Ross. My question is regarding the columns in a dimensional table that hold dates. For example, here is a table for Users of the application:
CREATE TABLE user_dim (
user_key BIGINT, -- surrogate key
user_id BIGINT, -- natural key
user_name VARCHAR(100),
...
user_added_date DATE, -- type 0, date user added to the system
...
-- Type-2 SCD administrative columns
row_start_date DATE, -- first effective date for this row
row_end_date DATE, -- last effective date for this row, 9999-12-31 if current
row_current_flag VARCHAR(10), -- current or expired
)
The last three attributes are for implementing type 2 slowly-changing dimensions. See Kimbal page 150-151.
Question 1: Is there are best practice for the data type of the row_start_date and row_end_date columns? The type could be DATE (as shown), STRING/VARCHAR/CHAR ("YYYY-MM-DD"), or even BIGINT (foreign key to Date Dimension). I don't think there would be much filtering on the row start/end dates, so a key to the Date Dimension is not required.
Question 2: Is there a best practice for the data type of dimension attributes such "user_added_date"? I can see someone wanting reports on users added per fiscal quarter, so using a foreign key to Date Dimension would be helpful. Any downsides to this, besides having to join from User Dimension to Date Dimension for display of the attribute?
If it matters, I'm using Amazon Redshift.