4
votes

I am reading in a 17-column CSV file into a database. once in a while the file has a "less then 17-column" row. I am trying to ignore the row, but even when all columns are set to ignore, I can't ignore that row and the package fails.

How to ignore those rows?

3
What error causes the package to fail? Is it failing while reading the file or writing to the database?Mark Wojciechowicz
it's failing to read the file. Thanks.arcee123
What's the error?Mark Wojciechowicz
cannot find delimiter on "phone type" which represents a record that did not have all of the requisite information. the column "Phone Type" is column 15 of 17 in the list of fields, and in several records it's not there. The file has over 400,000 to several million records in a given file.arcee123
Gotcha, I would use a StreamReader with a Try..catch block to throw out the bad rows. This will avoid loading the entire file into memory: msdn.microsoft.com/en-us/library/…Mark Wojciechowicz

3 Answers

4
votes

Solution Overview

you can do this by adding one Flat File Connection Manager add only one column with Data type DT_WSTR and a length of 4000 (assuming it's name is Column0) - So all column are considered as one big column

  • In the Dataflow task add a Script Component after the Flat File Source
  • In mark Column0 as Input Column and Add 17 Output Columns
  • In the Input0_ProcessInputRow method split Column0 by delimiter, Then check if the length of array is = 17 then assign values to output columns, Else ignore the row.

Detailed Solution

  1. Add a Flat file connection manager, Select the text file
  2. Go to the Advanced Tab, Delete all Columns except one Column
  3. Change the datatype of the remianing Column to DT_WSTR and length = 4000

enter image description here

  1. Add a DataFlow Task
  2. Inside the Data Flow Task add a Flat File Source, Script Component and OLEDB Destination

enter image description here

  1. In the Script Component Select Column0 as Input Column

enter image description here

  1. Add 17 Output Columns (the optimal output columns)
  2. Change the OutputBuffer SynchronousInput property to None

enter image description here

  1. Select the Script Language to Visual Basic

enter image description here

  1. In the Script Editor write the following Script

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    
        If Not Row.Column0_IsNull AndAlso
                Not String.IsNullOrEmpty(Row.Column0.Trim) Then
    
    
            Dim strColumns As String() = Row.Column0.Split(CChar(";"))
    
            If strColumns.Length <> 17 Then Exit Sub
    
    
            Output0Buffer.AddRow()
            Output0Buffer.Column = strColumns(0)
            Output0Buffer.Column1 = strColumns(1)
            Output0Buffer.Column2 = strColumns(2)
            Output0Buffer.Column3 = strColumns(3)
            Output0Buffer.Column4 = strColumns(4)
            Output0Buffer.Column5 = strColumns(5)
            Output0Buffer.Column6 = strColumns(6)
            Output0Buffer.Column7 = strColumns(7)
            Output0Buffer.Column8 = strColumns(8)
            Output0Buffer.Column9 = strColumns(9)
            Output0Buffer.Column10 = strColumns(10)
            Output0Buffer.Column11 = strColumns(11)
            Output0Buffer.Column12 = strColumns(12)
            Output0Buffer.Column13 = strColumns(13)
            Output0Buffer.Column14 = strColumns(14)
            Output0Buffer.Column15 = strColumns(15)
            Output0Buffer.Column16 = strColumns(16)
    
        End If
    
    End Sub
    
  2. Map the Output Columns to the Destination Columns

2
votes

C# Solution for Loading CSV and skip rows that don't have 17 columns:

Use a Script Component: On input/output screen add all of your outputs with data types.

string fName = @"C:\test.csv" // Full file path: it should reference via variable

string[] lines = System.IO.File.ReadAllLines(fName);

//add a counter
int ctr = 1;

foreach(string line in lines)
{
    string[] cols = line.Split(',');

    if(ctr!=1) //Assumes Header row. elim if 1st row has data
    {
    if(cols.Length == 17)
    {
          //Write out to Output
          Output0Buffer.AddRow();
          Output0Buffer.Col1 = cols[0].ToString(); //You need to cast to data type
          Output0Buffer.Col2 = int.Parse(cols[1]) // example to cast to int
          Output0Buffer.Col3 = DateTime.Parse(cols[2]) // example of datetime
          ... //rest of Columns
    }
    //optional else to handle skipped lines
    //else 
    // write out line somewhere
    }
    ctr++; //increment counter
}
1
votes

This is for @SidC comment in my other answer.

This lets you work with multiple files:

        //set up variables
        string line;
        int ctr = 0;

        string[] files = System.IO.Directory.GetFiles(@"c:/path", "filenames*.txt");
        foreach(string file in files)
        {
            var str = new System.IO.StreamReader(file);
            while((line = str.ReadLine()) != null)
            {
                // Work with line here similar to the other answer
            }
        }