0
votes

Using SSIS, I am importing a .txt file, which for the most part if straight forward.

The file being imported has a set amount of columns up to a point, but there is a free text/comments field, which can repeat to unknown length, similar to below.

   "000001","J Smith","Red","Free text here"
   "000002","A Ball","Blue","Free text here","but can","continue"
   "000003","W White","Green","Free text here","but can","continue","indefinitely"
   "000004","J Roley","Red","Free text here"

What I would ideally like to do (within SSIS) is to keep the first three columns as singular columns, but to merge any free-text ones into a single column. i.e. Merge/concatenate anything which appears after the 'colour' column.

So when I load this into an SSMS table, it appears like:

000001 | J Smith | Red   | Free text here                                     |
000002 | A Ball  | Blue  | Free text here but can continue                    |
000003 | W White | Green | Free text here but can continue indefinitely       |
000004 | J Roley | Red   | Free text here                                     |
2

2 Answers

0
votes

I do not see any easy solution. You can try something like below:

1. Load the complete raw data to a temp table (without any delimiter):

Steps:

  1. Create temp table in Execute SQL Task
  2. Create a data flow task, with flat file source (with Ragged Right format) and
  3. OLEDB destination (usint #temp table create in previous task)
  4. Set the delayValidation=True for connection manager and DFT
  5. Set retainSameConnection=True for connection manager

Refer this to create temp table and using it.

2. Create T-SQL to separate the 3 columns (something like below)

with col1 as (
  Select 
  [Val],
  substring([Val], 1 ,charindex(',', [Val]) - 1) col1,
  len(substring([Val], 1 ,charindex(',', [Val]))) + 1 col1Len
  from #temp
), col2 as (
  select 
  [Val],
  col1,
  substring([Val], col1Len, charindex(',', [Val], col1Len) - col1Len) as col2,
  charindex(',', [Val], col1Len) + 1 col2Len
   from col1
) select col1, col2, substring([Val], col2Len, 200) as col3
from col2

T-SQL Output:

col1    col2    col3
"000001"    "J Smith"   "Red","Free text here"
"000002"    "A Ball"    "Blue","Free text here","but can","continue"
"000003"    "W White"   "Green","Free text here","but can","continue","indefinitely"

3. Use the above query in OLEDB source in different data flow task

Replace double quotes (") as per your requirement.

0
votes

This was a fun exercise:

Add a data flow

Add a Script Component (select Source)

Add 4 columns to Outputs ID, Name Color , FreeText all type string

edit script:

Paste the following namespaces up top:

using System.Text.RegularExpressions;
using System.Linq;

Paste the following code into CreateNewOutputRows:

    string strPath = @"a:\test.txt";  \\put your file path in here
    var lines = System.IO.File.ReadAllLines(strPath);

    foreach (string line in lines)
    {
        //Code I stole to read CSV
        string delimeter = ",";

        Regex rgx = new Regex(String.Format("(\"[^\"]*\"|[^{0}])+", delimeter));
        var cols = rgx.Matches(line)
                      .Cast<Match>()
                      .Select(m => m.Value.Trim().Trim('"'))
                      .Where(v => !string.IsNullOrWhiteSpace(v));
        //create a column counter
        int ctr = 0;

        Output0Buffer.AddRow();

        //Preset FreeText to empty string
        string FreeTextBuilder = String.Empty;

        foreach( string col in cols)
        {

            switch (ctr)
            {
                case 0: 
                    Output0Buffer.ID = col; 
                    break;
                case 1:
                    Output0Buffer.Name = col;
                    break;
                case 2:
                    Output0Buffer.Color = col;
                    break;
                default:
                    FreeTextBuilder += col + " ";
                    break;
            }
            ctr++;
        }

        Output0Buffer.FreeText = FreeTextBuilder.Trim();

    }