Personally, I detest checkpoints. They don't work within the context of a loop and maybe it's just my bad luck, but I have far too often experienced a corruption of the checkpoint file
I always advise clients to design their ETL solutions with restartability in mind. At some point, the process will fail and you know it's going to be in the middle of something crucial so remove the "thinking" aspect of getting it restarted. Don't make it depend on someone running a script to clean up half committed loads, etc. But that's a different question...
Every object in your control flow will support a property called Disable
. When you run a package, you can deactivate tasks via the /SET command. Given the following example, each sequence container has a Script Task that does nothing but fires an information event
When I run that from the command line,
C:\Users\bfellows\source\repos\SO_Trash\SO_Trash>dtexec /file SO_61477591.dtsx /rep i
Microsoft (R) SQL Server Execute Package Utility
Version 14.0.3238.1 for 32-bit
Copyright (C) 2017 Microsoft. All rights reserved.
Started: 10:45:43 AM
Info: 2020-04-28 10:45:43.71
Code: 0x00000000
Source: SCR0 SCR Echo Back
Description: System::ParentContainerGUID : {3910E325-F3D0-4EC6-96FD-5E125FEC0ED2}
End Info
Info: 2020-04-28 10:45:43.74
Code: 0x00000000
Source: SCR1 SCR Echo Back
Description: System::ParentContainerGUID : {08bf88d5-71c3-4105-824e-bb8b1e942b24}
End Info
Info: 2020-04-28 10:45:43.77
Code: 0x00000000
Source: SCR2 SCR Echo Back
Description: System::ParentContainerGUID : {59823604-1caf-4a90-9303-45d55126d8b3}
End Info
Info: 2020-04-28 10:45:43.80
Code: 0x00000000
Source: SCR3 SCR Echo Back
Description: System::ParentContainerGUID : {977567f1-6bcf-4c84-909e-eb18ba17165d}
End Info
DTExec: The package execution returned DTSER_SUCCESS (0).
Started: 10:45:43 AM
Finished: 10:45:43 AM
Elapsed: 0.609 seconds
By specifying the path to the executable, I can set the property disable property like /set \Package\SEQC1.Properties[Disable];"True"
where SEQC1 is the case sensitive name of the object I want to disable
C:\Users\bfellows\source\repos\SO_Trash\SO_Trash>dtexec /file .\SO_61477591.dtsx /rep i /set \Package\SEQC1.Properties[Disable];"True"
Microsoft (R) SQL Server Execute Package Utility
Version 14.0.3238.1 for 32-bit
Copyright (C) 2017 Microsoft. All rights reserved.
Started: 10:45:50 AM
Info: 2020-04-28 10:45:50.62
Code: 0x00000000
Source: SCR0 SCR Echo Back
Description: System::ParentContainerGUID : {3910E325-F3D0-4EC6-96FD-5E125FEC0ED2}
End Info
Info: 2020-04-28 10:45:50.66
Code: 0x00000000
Source: SCR2 SCR Echo Back
Description: System::ParentContainerGUID : {59823604-1caf-4a90-9303-45d55126d8b3}
End Info
Info: 2020-04-28 10:45:50.69
Code: 0x00000000
Source: SCR3 SCR Echo Back
Description: System::ParentContainerGUID : {977567f1-6bcf-4c84-909e-eb18ba17165d}
End Info
DTExec: The package execution returned DTSER_SUCCESS (0).
Started: 10:45:50 AM
Finished: 10:45:50 AM
Elapsed: 0.609 seconds
Notice that SCR1 is no longer in the output due to the SET command disabling it.
Knowing how to disable tasks, you could leave the package as-is and modify your start to be text generated from SQL Queries (give me all the tasks that have completed and build the correct /SET statements)
A better approach that would require modification to the existing package would be have an Execute SQL Task(s) that runs at the start of the package and it sets Boolean SSIS variables with an indicator of whether the Task should be disabled. And then modify each object to have an expression on the Disable property that is the new SSIS variable.
Personally, my approach is to make many SSIS packages. Each package is a unit of work (load sales table). When a package fires, it records that execution has begun. When it completes successfully, it records that it has completed (as well has how many rows inserted/updated/deleted)
I then have a orchestrator package that queries the list of known packages and identifies any package that has not run successfully. It then uses a foreach loop with an Execute package task to then start running the child processes.
An advantage to my approach is that, assuming no dependencies between task 1 and 3, I could have two copies of this running - one "stack" handles the package 0 and 2, while a different stack handles 1 and 3. One of the advantages of SSIS is that it makes it trivial to parallelize your workload.