3
votes

I'm sorry if this a dumb question, as I can't seem to find the answer on google maybe because other people think it's obvious, but I'm a bit a confused beginner.

So, I'm practicing to make a small data warehouse. I have the following packages in SSIS: 1. Copy from operational database to archive database 2. Copy from archive to staging area 3. Transformations in staging area and put into data warehouse

These packages are controlled by jobs in SQL Server.

I figured it would be also possible to have only one package and put all 1-2-3 as separate data flow tasks into that package. I started to wonder about this because all task only have one data flow task in it (in the data flow task itself there are many source-destination stuff going on though).

What are best practices for this? When do you make different packages and when multiple data flow tasks.

Thanks in advance.

1

1 Answers

3
votes

It's rather a matter of personal taste. Personally, I would prefer to keep them separated for several reasons, including but not limited to:

  • If a package #2 breaks at some point because of schema drift or something, package #3 will still work normally (for some tables, at least).
  • In a team, it's better to keep things small because several people might need to check out different packages from CVS to work on them simultaneously. It will be simply impossible if everything is located within a single gigantic package.
  • When a warehouse becomes large enough (hundreds of tables) it is often practical to split each stage between several packages - say, one for each data source / subject area. It makes easier not to get lost in them.

Besides, large SSIS packages are quite unwieldy - I remember working with one that weighs ~30 Mb, it was a real pain in the rear.