1
votes

In my SSIS package, I have an Execute SQL Task that is supposed to return up to one hundred million (100,000,000) rows.

I would like to export these results to multiple CSV files, where each file has a maximum of 500,000 rows. So if the SQL task generates 100,000,000 results, I would like to produce 200 csv files with 500,000 records in each.

What are the best SSIS tasks that can automatically partition the results into many exported CSV files?

I am currently developing a script task but find that it's not very performant. I am a bit new to SSIS so I am not familiar with all the different tasks available, and I'm wondering if maybe there's another one that can do it much more efficiently.

Any recommendations?

1

1 Answers

2
votes

Static approach

First add a dataflow task.

In the dataflow task add the following:

  1. A source: in the screenshot ADO NET Source. That contains the query to retrieve the data
  2. A conditional split: Every condtion you add will result in a blue output arrow. You need to connect every arrow to a destination
  3. Excel destination or flat file destiation. Depending if you want Excel files or csv files. For CSV files you'll need to setup a file connection.

Dataflow with conditional split

In the conditional split you can add multiple conditions to split out your data and have a default output.

Conditional split

Flat file connection manager: Flat file connection manager

Dynamic approach

  1. Use Execute SQL Task to retrieve the variables to start a for loop. (BatchSize, Start, End)
  2. Add a for / foreach
  3. Add a dataflow task in the loop, pass in the parameters from the loop. (You can pass parameters/expressions to sub process in the dataflow using the expressions property. expressions on dataflow)
  4. Fetch the data with a source in a dataflow task based on the parameters from the for loop.
  5. Write to a destination (Excel/CSV) with a dynamic name based from the parameters of the loop.

Dynamic approach