1
votes

I have an SSIS package for doing ETL on a schema which has say 5 columns. Each of these columns needs to be cast to appropriate data types.

I initially did all cast inside a single data flow conversion task, which made it difficult to figure out what casting failed from the error output.

I later decided to use separate data flow conversion tasks in a row, for each column.

I want to know which approach will give me a better performance?

I get all my data from flat files(pipe separated)

1
I agree with below - fastest might be casting them in the source SQL queryNick.McDermaid
I haven't tested but I assume there is overhead for every derived column transform because each needs an input and output buffer, so at least do all the conversions in one derived column transform rather than doing multiple derived columns with a single transform expression. In all of my ETL I load into a staging table full of varchar columns, fix the data there then transfer it elsewhere. I can add as much logic as I like to identify bad columns and it's much faster than transforming in SSISNick.McDermaid

1 Answers

3
votes

It would really depend on how much data are you processing.
What are the data types are you converting from and to?
At what stage in the ETL process do you need to convert the data?
The fastest method might be to cast them at source (if that's possible). If the conversion is between a database to database on the same server then doing it at the database level can also be an option.
You will need some testing to identify which method suits your process best.