0
votes

I have a SQL table which holds below details. Invoice Id and their respective details. Each Invoice can have single or multiple lines.

enter image description here

My requirement is I need to split multiple CSV files. I can do that easily using SSIS.

But the catch is supposed an invoice has 4 lines, then it should be captured in single file. Not 2 lines in one file and another two lines in another file. This has to be handled while exporting in SSIS

For example, the given data has 20 records so ideally if I am splitting to 10 records each, then 2 files would be extracted. But if you see in the 8th-row invoice 105 has 4 lines. Hence it should move to 2nd file. And the 1st file will have till Invoice 104 which comes to 8 rows(which is fine). This seems a very tricky task for me. I would appreciate any inputs on how to achieve this.

2
So why are you splitting the files? Why can't you just write everything to one file? Answering that will help us suggest a solution. - Nick.McDermaid
Hi Nick, Because that is the requirement. I just gave an example here. There are millions of records in the table. Hence the split. - Satyajit
What are the actual parameters? 2 files? 20 files? 1 file per header? - Nick.McDermaid
If you run a select for different ranges of invoice id's you'll always get full invoices. BUt how many invoices per file? or how many files? or how many rows? We don't know why you need to split or how many you want to split it into. - Nick.McDermaid
For example you could use select MAX(InvoiceID) / 2 from YourTable to work out a data range, then export the records to two files, but is two files what you want? four files? you need to clarify. - Nick.McDermaid

2 Answers

0
votes

This is a good question,this may be the requirement from client side or from our dear managers. I have a solution for this -

Write Sql query your self I'm giving the logic.

1-Foreachloop will stop when there is no records in that table

Declare @cnt int Set @cnt=(Select count(*) from your_table) IF @cnt>0

2-First of all separate the records by using Row_Number() function partition by Invoice Id and count function group by Invoice Id where count is greater or equal to 4 and insert into a temp table using Execute SQL Task.

3-Find the records on the basis of Row_Number() assign this to a variable and then delete from the table these records only Execute SQL Task as a full result set.Next iteration this records will not be there.

4-Insert data into Excel sheet from variable with time stamp.

5-In Next iteration it will pick next Invoice Id which having 4 or more than 4 records on the basis of Row_Number.

6-Out side of this loop We will take all Invoice Id having than 4 records and insert into Excel sheet with timestamp.

I have created similar project in which client wanted data should be bifurcated on the basis of city, this is similar to that,Currently SQL Server and Visual Studio data tool is not installed in my machine otherwise so can't give you exact component ,please try the logic and let me know once you have any issue.

0
votes

You can use a combination of CTEs:

This will get you first half without splitting Invoices between files. Use >= to get other half.

;with AddRN as (
select rn = row_number() over (order by InvoiceID,[Line ID])
    ,*
from table
)
, HalfOfRN as (
select half = max(rn) / 2 from AddRN
)
,MiddleInv as (
select InvoiceID from AddRN where rn=(select half from HalfOfRN))

select *
from table 
where InvoiceID< (Select InvoiceID from MiddleInv)