I am creating an SSIS package that loads data from a source system into a Data Warehouse using lookups. Here is how the process is currently set up:
- Load all relevant tables to staging tables (6 lookup tables and the data for the fact table)
- For each lookup table use a Lookup component to determine if the record already exists in the data warehouse dimension table. If it does match get the ID from the dimension, if it does not match derive a match.
- Load into the fact table the IDs to the dimensions and two or three other pieces of data that flowed through from the source.
Not very complex. The lookup tables have anywhere from 10 - 50 records. I have limited my source data to 10,000 records in order for the package to complete in a reasonable amount of time. Takes 20 minutes to load 10,000 records.
The time line in the messages shows:
- 11:13:28 - Package Started
- 11:13:36 - All the lookups were prepared and ready
- 11:32:33 - The final commit for the data insertion in 'Target - Fact Insert new record has ended'
- 11:32:34 - Package Ended
Where can I look to see why it took almost 19 minutes to insert 10,000 records? Any suggestions on what I can do differently to make this run faster?
Edit: Responses to questions below
Data Types - Not sure what you are asking? The data types of what I am looking up? integers are the ID fields. The Code field are either integers or Char(4).
Data Flow of main process is: Source (run query to get info from stage table), Lookup a LocationID, match and unmatched to Union, Lookup an Injury CodeID, match and unmatched to Union, repeat for the next 4 lookup tables, sort the source, merge join my target (currently an empty table), determine Add, Change or Delete record (these are all Adds).
The target table is:
CREATE TABLE dbo.tfResidentEvent
(
ResidentEventID INT IDENTITY(1, 1)
PRIMARY KEY ,
SourceSystemID INT ,
EventID INTEGER ,
EventCodeID INT ,
FacilityID INT ,
ResidentID INT ,
DateID INT ,
TimeID INT ,
LocationCodeID INT ,
CauseCodeID INT ,
ConditionCodeID INT ,
InjuryCodeID INT ,
chksum_type1 int,
chksum_type2 int
)
The timings are coming from the Messages shown on the server for the package execution.
SSIS on SQL Server 2012
I will try to re-do the package with derived columns and see how long that takes to execute.