1
votes

In my SSIS Data Flow I create a derived column (DT_WSTR) based on a concatenation of two other columns. I want to save the max length in this column in a variable (with SQL it would be a MAX(LEN(COLUMN))). How is this done?

1
The max length of all the values in the 2 combined columns?Jacob H
Yes the max length of the derived column = the 2 combined columnsJohn
How do you plan on using this? That will affect the answer.KeithL

1 Answers

2
votes

Add another Derived Column after your Derived Column that calculates the length of the computed column. Let's call it ColumnLength

LEN(COLUMN)

Now add a Multicast transformation. One path from here will go on to the "rest" of your data flow. A new path will lead to an Aggregate transformation. There, specify you want the maximum.

Now - what do you want to do with that information?

  1. Write to a table -> OLE DB Destination
  2. Report to the log -> Script Task that fires and information event
  3. Use elsewhere in the package -> Recordset destination and then use a foreach loop to pop the one row, one value out it
  4. Something else?

Sample data flow assuming you chose option 3 - recordset destination

A data flow as described above

I need to create 2 variables in my SSIS package. objRecordset of type Object and MaxColumnLength of type Int32

When the data flow finishes, all my data would have arrived in my table (represented by the Script Component) and my aggregated maximum length will flow into the recordset destination which uses my variable objRecordset

To get the value out of the ado.net recordset and into our single variable, we need to "shred the recordset" Google that term, you'll find many, many examples.

My control flow will look something like this

A control flow as Data Flow -> Foreach Ado.NET enumerator -> placeholder sequence container

The ForEach (ado.net) Enumerator Loop Container is consumes every row in our dataset and we will specify that the our variable MaxColumnLength will be the 0th element of the table.

Finally, I put a sequence container in there so I can get a breakpoint to fire. We see the length of my max column variable to be 15 which matches my source query

SELECT 'a' As [COLUMN]
UNION ALL SELECT 'ZZZZZZZZZZZZZZZ'

I believe this addresses the problem you have asked.

As a data warehouse practitioner, I would encourage you to rethink your approach on the lookups. Yes, the 400 character column is going to wreak havoc on your memory so "math it out". Use the cryptological functions available to you and compute a fixed width, unique, key for that column and then you only work with that data.

SELECT 
     CONVERT(binary(20), HASHBYTES('SHA1', MyBusinessKeys)) AS BusHashKey
FROM
    dbo.MyDimension;

Now you have 20 bytes, always and SHA1 is unlikely to generate duplicate values.