3
votes

I need your help.

I have an .xlsx file which looks like this:

enter image description here

My goal is to create a SSIS package which pushes this data into a DB table.

Now, col1 to col5 is ok, but each section has a name on top of it and that is supposed to be column 6 in the table. So the final destination table looks like:

col1 | col2 | col3 | col4 | col5 | col6 (Firstname Lastname from the top)

So far I have tried:

  1. Creating a recordset out of the excel sheet
  2. read recordset row by row using Ado foreach enumerator
  3. within the for each enumerator I have a set of variables that will represent the columns
  4. these variables I am flushing out to a data flow task which converts variables to columns using derived column and pushes it into the ODBC destination

Obviously this did not work out for me, I always get the message "0 rows inserted in the ODBC destination" when I run the package.

To be honest I am not really sure how to solve this problem.

Any help is highly appreciated!!!

Thanks in Advance!!!

Edit:

PS: I can not use any one time or Power BI / Query tricks here. It has to be pure SSIS.

3
Before exporting this excel data to your DB, you can write a small vba macro that can re-arange the data into a new sheet as per your desired format, and than import that sheet directly to that database.usmanhaq
Hi, thanks for the comment, this is part of a large set of files and something like this wont be feasible for meSutharMonil
You need to write a C#/VB script using Script Component to implement this logic.Hadi
Is the installation of the ACE OLE/DB driver an option microsoft.com/en-us/download/details.aspx?id=54920billinkc

3 Answers

2
votes

The solution I can think of, is through script task, in these steps

  1. Read the Excel file from row 4
  2. Script task to add name to the data rows
  3. Remove column headings via conditional split
  • for the first step open Excel source properties, unselect column names in the first row, and set the OpenRowset to read from row four (Sheet1$A4:E) :

enter image description here

  • for the second step create a script component transformation. Select the 5 columns as input and create a new output variable (Name, in this case)

enter image description here

The script itself:

public class ScriptMain : UserComponent
{


    string keepname;
    
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (Row.F5_IsNull)
            { keepname = Row.F1;
            
        }
        else
        {
            if (Row.F5 != "col5")
            {
                Row.Name = keepname;
            }

        }
    }
    
}

Explanation: checks if it is a row containing name (col5 is null) and it so, it saves the name to variable keepname. if is not that, and it is not a heading (col1='col1') then update the variable Name.

  • Last step is just a clean up via conditional split

enter image description here

What this does, is split the rows that are 'name rows' or headings. The you just need to carry one with default output.

Test:

enter image description here

1
votes

I think that the best way to import a file like this one is to make a Script task to convert the Excel file into a CSV and then upload the CSV into the database as a simple flat file source.

  1. Create two variables named VarDataPath with the path of Excel file and VarCSVPath with the path of the CSV generated file.
  2. Create a Script task with the following VB.net snippet. Unfortunally I have not SSIS on my computer at the moment so I can't try my code.

Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

    Dim xl As Object
    Dim wb As Object
    Dim sh As Object
    Dim NumSheets As Integer

    Dim counter As Long

    Dim fs As Object
    Dim conv_file As Object
    Dim line As String

    Dim strDataPath As String = Dts.Variables("VarDataPath").Value.ToString() 'This variable contains the Excel path
    Dim strFileName As String = Dts.Variables("VarCSVPath").Value.ToString() 'This variable contains the CSV path

    Dim myArray As Integer() = New Integer() {4, 10, 16} ' Row number of FirstName/LastName

    xl = CreateObject("Excel.Application")
    wb = xl.WorkBooks.Open(strDataPath)
    sh = wb.Sheets(1)

    fs = CreateObject("Scripting.FileSystemObject")
    conv_file = fs.CreateTextFile(strFileName, True)

    ' CSV file head
    conv_file.writeline("Name,Col1,Col2,Col3,Col4,Col5")

    For Each val In myArray
    For a = 2 To 5 ' I have to loop all 4 rows of tables starting from 2 rows down the FirstName/LastName cell
    ' Each line I wrote valeu of Name,Col1,Col2,Col3,Col4,Col5
            line = ""
            line=line & sh.Cells(val, 1).ToString & ";" 'Name
            line=line & sh.Cells(val+a, 1).ToString & ";" 'Col1
            line=line & sh.Cells(val+a, 2).ToString & ";" 'Col2
            line=line & sh.Cells(val+a, 3).ToString & ";" 'Col3
            line=line & sh.Cells(val+a, 4).ToString & ";" 'Col4
            line=line & sh.Cells(val+a, 5).ToString & ";" 'Col5
            conv_file.writeline(line)
        Next
    Next
    
    
    conv_file.close()

    sh = Nothing
    wb.Close(False)
    wb = Nothing
    xl.Quit()
    xl = Nothing

    Dts.TaskResult = Dts.Results.Success
End Sub

End Class

  1. Execute the package to let SSIS create a CSV file.
  2. Add a data flow to the package to upload the CSV generated into the database.

Also, you can have a look at these articles:

0
votes

I would use Power Query for this. It's built into Excel, also in Power BI and Power Automate, and has more flexible data transformation than SSIS data flows. You can code in "M" (the Power Query language), but I very rarely do - the UI does almost anything you could want.

For example, once you come up with a calculated column to create "column6", it has a "Fill Down" function that can push each group's value down onto the detailed rows.

This would be my approach for calculating "column6":

  1. Data ribbon / From Table/range. Specify sheet and range
  2. In the Power Query Editor, select [Column1], then Add Column / Extract / First Characters / 1
  3. Select [First Characters], then Transform / Data Type / Whole Number
  4. Select [First Characters], then Replace Values / Replace Errors / null
  5. Add Column / Conditional Column, specify New column name = "Column6", If Column1" equals "column1" then "null" Else If "Column1" equals "null" then (Select a column) [Column1] Else "null"
  6. Select [Column6], then Transform / Fill / Down

Once you have a working query, you can copy the PQ code into the PQ Source for SSIS:

https://docs.microsoft.com/en-us/sql/integration-services/data-flow/power-query-source