I have a table in my SQL Server where I "stage" my datawarehouse extract from our ERP system.
From this staging table (table name: DBO.DWUSD_LIVE) , I build my dimensions and load my fact data.
An example DIMENSION table is called "SHIPTO", this dimensions has the following columns:
"shipto_id
"shipto"
"salpha"
"ssalpha"
"shipto address"
"shipto name"
"shipto city"
Right now I have an SSIS package that does a SELECT DISTINCT across the above columns to retrieve the "unique" data, then through the SSIS package I assign the "shipto_id" surrogate key to.
An example of my current TSQL Query is:
SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE
This works great but is not "speedy", some dimensions have 10 columns and doing a distinct select on those is not ideal.
In this dimension, my "Business Key" columns are "SHIPTO", "SALPHA", and "SSALPHA".
So if I do:
SELECT DISTINCT
"shipto", "salpha", "ssalpha"
FROM DBO.DWUSD_LIVE
It yields the same results as:
SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE
Is there a better way to do this TSQL QUERY? I need all the columns, but only DISTINCT on the business key columns.
Your help is appreciated.
Below is an image of how my project is setup in SSIS, the Dimensions is a SCD 1.
