0
votes

Please look at the following staging table. It has multiple rows for the same policy. Data to this table is loaded from a flat file I receive from external sources.

Column values can change between one row to the next row. See ColA. There could be limited columns populated in the first row. More columns will be populated in the next rows. See columns ColB and ColC, they are null initially and are populated in second and third rows.

`CREATE TABLE #Stg
(
  PolicyNum VARCHAR(10) ,
  ColA VARCHAR(10) ,
  ColB VARCHAR(10) ,
  ColC VARCHAR(10) ,
  TimeStampKey VARCHAR(100)
)

INSERT  #Stg
    ( PolicyNum, ColA, ColB, ColC, TimeStampKey )
VALUES  ( 'MDT1000', 'SomeVal_A1', NULL, NULL, '2013041113033140MDT1000ZA' )
    ,
    ( 'MDT1000', 'SomeVal_A2', 'SomeVal_B', NULL, '2013041113051756MDT1000ZA' )
    ,
    ( 'MDT1000', 'SomeVal_A3', 'SomeVal_B', 'SomeVal_C', '2013041113115418MDT1000ZA' )`

From this staging table I need to load data to a destination table while maintaing history. Destination table is basically a type 2 slowly chaning dimension. In other words, I've load the first row from staging because it doesn't exist and update it with the second row and update again with the third row.

Folliwing is an example of destination schema:

CREATE TABLE #Dst
(
PolicyKey INT IDENTITY(1,1) PRIMARY KEY
, PolicyNum VARCHAR(10)
, ColA VARCHAR(10) 
, ColB VARCHAR(10)
, ColC VARCHAR(10)
, IsActive BIT
, RowStartDate DATETIME
, RowEndDate DATETIME
)

Normally I'd write a merge statement or an SSIS package to handle incremental loads and scd dimensions, but since original record and change records are in the same file the standard approach doesn't work.

I'd appreciate if you can throw some light on how to approach this. I'm trying to avoid row by row operations.

Thanks, Sam.

1
My initial approach would look to solve this by segmenting the source data into a unique set of PolicyKey and then a repeated set. Both data sets would need to then be checked against the database for existence during the load. So, dump all to a staging table, and then do something like my Delete first approach inside a for loop to get the unique keys and then the rest. Let me know if you want me to try to blow that out into a proper answerbillinkc
Thanks Bill. This is a good idea, although instead of "deleting first" the staging tables I eneded up segmenting by adding an additional column and using it to mark segmets to be processed and looping them. Do you want to post your comment as an answer, I'll mark it as the answer. Thanks again!Samuel Vanga

1 Answers

0
votes

try this:

SELECT 
  Stg.*
FROM Stg 
INNER JOIN 
(
    SELECT PolicyNum, MAX (TimeStampKey) AS MAX_TimeStampKey 
    FROM Stg
    GROUP BY PolicyNum 
 ) T 
 ON T.PolicyNum = Stg.PolicyNum
 AND T.MAX_TimeStampKey = Stg.TimeStampKey

The result:

 PolicyNum  ColA       ColB       ColC       TimeStampKey
 ---------- ---------- ---------- ---------- -------------------
 MDT1000    SomeVal_A3 SomeVal_B  SomeVal_C  2013041113115418MDT1000ZA

Please let us know if this helped you.