1
votes

I am using SQL Server Data Tools 2013 to create an SSIS package. This package has an Execute SQL Task with a Full Resultset option to push the query results into an SSIS Variable, of type Object.

I'm using the following in a script task to take a recordset stored in an object variable and write it to a CSV:

    Public Sub Main()

    Dim fileName As String = Dts.Variables("vFileName").Value.ToString
    Dim destinationPath As String = Dts.Variables("vDestinationPath").Value.ToString
    Dim destinationPathAndFileName As String = destinationPath + fileName
    Dim fileContents As String = ""

    Dim oleDB As OleDbDataAdapter = New OleDbDataAdapter()
    Dim table As DataTable = New DataTable()
    Dim rs As System.Object = Dts.Variables("vResultSet").Value

    ' Populate DataTable with vResultSet data
    oleDB.Fill(table, rs)

    ' Loop through columns and concatenate with commas
    For Each col As DataColumn In table.Columns
        fileContents &= col.ColumnName & ","
    Next

    ' Remove final comma from columns string and append line break
    fileContents = fileContents.Substring(0, fileContents.Length - 1)
    fileContents &= Environment.NewLine

    ' Loop through rows and concatenate with commas
    Dim i As Integer
    For Each row As DataRow In table.Rows
        For i = 1 To table.Columns.Count
            fileContents &= row(i - 1).ToString() & ","
        Next

        ' Remove final comma from row string and append line break
        fileContents = fileContents.Substring(0, fileContents.Length - 1)
        fileContents &= Environment.NewLine

    Next

    ' Write all text to destination file. If file exists, this step will overwrite it.
    System.IO.File.WriteAllText(destinationPathAndFileName, fileContents)

    Dts.TaskResult = ScriptResults.Success
End Sub

This works, but it's veeeery slow, like 25+ minutes to write a single 14k-row dataset to CSV. I can't use a data flow because this process exists in a loop, and the metadata for each table to be exported is different. I'm pretty sure a script task is the only option, but is there a faster way than looping through each row of the dataset? Please let me know if I can provide more info.

2
Does this have to be vb? Asked another way, are you on 2005 or 2008+?billinkc
It doesn't have to be VB; I'm just more familiar with VB than C# so that's what I default to. I'm using SQL Server Data Tools for Visual Studio 2013.Kel
Final question, the recordset you are passing in, that's just getting populated from an Execute SQL Task that passes full resultset to an SSIS variable of type Object, yeah?billinkc
Yep, that's right! I was under the mistaken impression that once I got the data into the full resultset, it would be simple to just dump it to a file. But the only way I've found to do so is this loop process, which is too slow for the volume of data this process is intended to handle.Kel

2 Answers

4
votes

Feel free to translate to VB.NET as you see fit. Seeing as how I already have this code ~ written for a different project, I mashed your request in with how mine works

Passing in 3 SSIS variables: vFileName, vDestinationPath and vResultSet, the code in Main will convert the ado recordset into a DataTable which is then added to a DataSet and passed to the Persist method. Persist has a default parameter for delimiter of |.

This implementation does not attempt to deal with any of the corner cases, at all. It does not escape text columns with a qualifier, doesn't escape embedded qualifiers, do anything with newlines in the feeds and something in the OleDbDataAdapter's fill method fails with binary data, etc

    public void Main()
    {
        string fileName = Dts.Variables["User::vFileName"].Value.ToString();
        DataSet ds = null;
        DataTable dt = null;
        string outputFolder = Dts.Variables["User::vDestinationPath"].Value.ToString();
        string fileMask = string.Empty;
        string sheetName = string.Empty;
        string outSubFolder = string.Empty;
        string message = string.Empty;
        bool fireAgain = true;
        try
        {

            ds = new DataSet();
            dt = new DataTable();

            System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter();
            adapter.Fill(dt, Dts.Variables["User::vResultSet"].Value);

            string baseFileName = System.IO.Path.GetFileNameWithoutExtension(fileName);
            baseFileName = System.IO.Path.GetFileName(fileName);

            ds.Tables.Add(dt);
            //foreach (DataTable dt in ds.Tables)
            {
                Persist(ds, fileName, outputFolder);
            }
        }
        catch (Exception ex)
        {
            Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "fileName", fileName), string.Empty, 0, ref fireAgain);
            Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "outputFolder", outputFolder), string.Empty, 0, ref fireAgain);
            Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "ExceptionDetails", ex.ToString()), string.Empty, 0, ref fireAgain);
            Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "InnerExceptionDetails", ex.InnerException), string.Empty, 0, ref fireAgain);
        }

        Dts.TaskResult = (int)ScriptResults.Success;
    }

    public static void Persist(System.Data.DataSet ds, string originalFileName, string outputFolder, string delimiter = "|")
    {
        // Enumerate through all the tables in the dataset
        // Save it out as sub versions of the 
        if (ds == null)
        {
            return;
        }

        string baseFileName = System.IO.Path.GetFileNameWithoutExtension(originalFileName);
        string baseFolder = System.IO.Path.GetDirectoryName(originalFileName);
        System.Collections.Generic.List<string> header = null;            

        foreach (System.Data.DataTable table in ds.Tables)
        {
            string outFilePath = System.IO.Path.Combine(outputFolder, string.Format("{0}.{1}.csv", baseFileName, table.TableName));
            System.Text.Encoding e = System.Text.Encoding.Default;

            if (table.ExtendedProperties.ContainsKey("Unicode") && (bool)table.ExtendedProperties["Unicode"])
            {
                e = System.Text.Encoding.Unicode;
            }

            using (System.IO.StreamWriter file = new System.IO.StreamWriter(System.IO.File.Open(outFilePath, System.IO.FileMode.Create), e))
            {
                table.ExtendedProperties.Add("Path", outFilePath);

                // add header row
                header = new System.Collections.Generic.List<string>(table.Columns.Count);
                foreach (System.Data.DataColumn item in table.Columns)
                {
                    header.Add(item.ColumnName);
                }

                file.WriteLine(string.Join(delimiter, header));

                foreach (System.Data.DataRow row in table.Rows)
                {
                    // TODO: For string based fields, capture the max length
                    IEnumerable<string> fields = (row.ItemArray).Select(field => field.ToString());

                    file.WriteLine(string.Join(delimiter, fields));
                }
            }
        }
    }

Need to run but a Biml implementation looks like

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="tempdb" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=AdventureWorksDW2014;Provider=SQLNCLI11.0;Integrated Security=SSPI;"/>
    </Connections>
    <Packages>
        <Package Name="so_37059747" ConstraintMode="Linear">
            <Variables>
                <Variable DataType="String" Name="QuerySource"><![CDATA[SELECT
    S.name
,   T.name
FROM
    sys.schemas AS S
    INNER JOIN
        sys.tables AS T 
        ON T.schema_id = S.schema_id;]]></Variable>
                <Variable DataType="String" Name="SchemaName">dbo</Variable>
                <Variable DataType="String" Name="TableName">foo</Variable>
                <Variable DataType="String" Name="QueryTableDump" EvaluateAsExpression="true">"SELECT X.* FROM [" + @[User::SchemaName] + "].[" + @[User::TableName] + "] AS X;"</Variable>
                <Variable DataType="Object" Name="rsTables"></Variable>
                <Variable DataType="Object" Name="vResultSet"></Variable>
                <Variable DataType="String" Name="vFileName" EvaluateAsExpression="true">@[User::SchemaName] + "_" + @[User::TableName] + ".txt"</Variable>
                <Variable DataType="String" Name="vDestinationPath">c:\ssisdata\so\Output</Variable>
            </Variables>
            <Tasks>
                <ExecuteSQL 
                    ConnectionName="tempdb" 
                    Name="SQL Generate Loop data"
                    ResultSet="Full">
                    <VariableInput VariableName="User.QuerySource" />
                    <Results>
                        <Result VariableName="User.rsTables" Name="0" />
                    </Results>
                </ExecuteSQL>
                <ForEachAdoLoop SourceVariableName="User.rsTables" Name="FELC Shred rs" ConstraintMode="Linear">
                    <VariableMappings>
                        <VariableMapping VariableName="User.SchemaName" Name="0" />
                        <VariableMapping VariableName="User.TableName" Name="1" />
                    </VariableMappings>
                    <Tasks>
                        <ExecuteSQL 
                            ConnectionName="tempdb" 
                            Name="SQL Generate Export data"
                            ResultSet="Full">
                            <VariableInput VariableName="User.QueryTableDump" />
                            <Results>
                                <Result VariableName="User.vResultSet" Name="0" />
                            </Results>
                        </ExecuteSQL>
                        <Script ProjectCoreName="ST_RS2CSV" Name="SCR Convert to text">
                            <ScriptTaskProjectReference ScriptTaskProjectName="ST_RS2CSV" />
                        </Script>
                    </Tasks>
                </ForEachAdoLoop>
            </Tasks>
        </Package>
    </Packages>
    <ScriptProjects>
        <ScriptTaskProject ProjectCoreName="ST_RS2CSV" Name="ST_RS2CSV" VstaMajorVersion="0">
            <ReadOnlyVariables>
                <Variable Namespace="User" VariableName="vFileName" DataType="String" />
                <Variable Namespace="User" VariableName="vDestinationPath" DataType="String" />
                <Variable Namespace="User" VariableName="vResultSet" DataType="Object" />
            </ReadOnlyVariables>
            <Files>
                <File Path="ScriptMain.cs" BuildAction="Compile">
                    <![CDATA[namespace DataDumper
{
    using System;
    using System.Collections.Generic;
    using System.Data;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using System.Xml.Linq;
    using Microsoft.SqlServer.Dts.Runtime;

    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            string fileName = Dts.Variables["User::vFileName"].Value.ToString();
            DataSet ds = null;
            DataTable dt = null;
            string outputFolder = Dts.Variables["User::vDestinationPath"].Value.ToString();
            string fileMask = string.Empty;
            string sheetName = string.Empty;
            string outSubFolder = string.Empty;
            string message = string.Empty;
            bool fireAgain = true;
            try
            {

                ds = new DataSet();
                dt = new DataTable();

                System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter();
                adapter.Fill(dt, Dts.Variables["User::vResultSet"].Value);

                string baseFileName = System.IO.Path.GetFileNameWithoutExtension(fileName);
                baseFileName = System.IO.Path.GetFileName(fileName);

                ds.Tables.Add(dt);
                //foreach (DataTable dt in ds.Tables)
                {
                    Persist(ds, fileName, outputFolder);
                }
            }
            catch (Exception ex)
            {
                Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "fileName", fileName), string.Empty, 0, ref fireAgain);
                Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "outputFolder", outputFolder), string.Empty, 0, ref fireAgain);
                Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "ExceptionDetails", ex.ToString()), string.Empty, 0, ref fireAgain);
                Dts.Events.FireInformation(0, "Data Dumper", string.Format("{0}|{1}", "InnerExceptionDetails", ex.InnerException), string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        public static void Persist(System.Data.DataSet ds, string originalFileName, string outputFolder, string delimiter = "|")
        {
            // Enumerate through all the tables in the dataset
            // Save it out as sub versions of the 
            if (ds == null)
            {
                return;
            }

            string baseFileName = System.IO.Path.GetFileNameWithoutExtension(originalFileName);
            string baseFolder = System.IO.Path.GetDirectoryName(originalFileName);
            System.Collections.Generic.List<string> header = null;            

            foreach (System.Data.DataTable table in ds.Tables)
            {
                string outFilePath = System.IO.Path.Combine(outputFolder, string.Format("{0}.{1}.csv", baseFileName, table.TableName));
                System.Text.Encoding e = System.Text.Encoding.Default;

                if (table.ExtendedProperties.ContainsKey("Unicode") && (bool)table.ExtendedProperties["Unicode"])
                {
                    e = System.Text.Encoding.Unicode;
                }

                using (System.IO.StreamWriter file = new System.IO.StreamWriter(System.IO.File.Open(outFilePath, System.IO.FileMode.Create), e))
                {
                    table.ExtendedProperties.Add("Path", outFilePath);

                    // add header row
                    header = new System.Collections.Generic.List<string>(table.Columns.Count);
                    foreach (System.Data.DataColumn item in table.Columns)
                    {
                        header.Add(item.ColumnName);
                    }

                    file.WriteLine(string.Join(delimiter, header));

                    foreach (System.Data.DataRow row in table.Rows)
                    {
                        // TODO: For string based fields, capture the max length
                        IEnumerable<string> fields = (row.ItemArray).Select(field => field.ToString());

                        file.WriteLine(string.Join(delimiter, fields));
                    }
                }
            }
        }
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
    }
}                
]]>
                </File>
                <File Path="Properties\AssemblyInfo.cs" BuildAction="Compile">
                    using System.Reflection;
                    using System.Runtime.CompilerServices;

                    [assembly: AssemblyTitle("AssemblyTitle")]
                    [assembly: AssemblyDescription("")]
                    [assembly: AssemblyConfiguration("")]
                    [assembly: AssemblyCompany("Bill Fellows")]
                    [assembly: AssemblyProduct("ProductName")]
                    [assembly: AssemblyCopyright("Copyright @  2016")]
                    [assembly: AssemblyTrademark("")]
                    [assembly: AssemblyCulture("")]
                    [assembly: AssemblyVersion("1.0.*")]
                </File>
            </Files>
            <AssemblyReferences>
                <AssemblyReference AssemblyPath="System" />
                <AssemblyReference AssemblyPath="System.Core" />
                <AssemblyReference AssemblyPath="System.Data" />
                <AssemblyReference AssemblyPath="System.Data.DataSetExtensions" />
                <AssemblyReference AssemblyPath="System.Windows.Forms" />
                <AssemblyReference AssemblyPath="System.Xml" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ManagedDTS.dll" />
                <AssemblyReference AssemblyPath="Microsoft.SqlServer.ScriptTask.dll" />
                <AssemblyReference AssemblyPath="System.Linq" />
                <AssemblyReference AssemblyPath="System.Xml.Linq" />
                <AssemblyReference AssemblyPath="Microsoft.VisualBasic" />
            </AssemblyReferences>
        </ScriptTaskProject>
    </ScriptProjects>

</Biml>

That dumped all of AdventureworksDW2014 in 15 seconds

Based on the comment that this line is failing IEnumerable<string> fields = (row.ItemArray).Select(field => field.ToString());

Ensure that you have the following using statements in your project. I think those extensions are in the Linq namespaces but it could have been the Collections

    using System;
    using System.Collections.Generic;
    using System.Data;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using System.Xml.Linq;
    using Microsoft.SqlServer.Dts.Runtime;

Why was the original slow?

My assumption is the slowness boils down to all that concatenation. Strings are immutable in .Net and you are creating a new version of that string each time you add a column to it. When I build my line, I'm using the String.Join method to zip up each element an array into a single string. This also simplifies the logic required to append the field delimiters.

I also immediately write the current line to a file instead of bloating my memory just to dump it all with a call to WriteAllText

2
votes

This is the VB.NET version of @billinkc's excellent answer in case it's useful to anyone:

Imports System

Imports System.Data 
Imports System.Math 
Imports System.Collections 
Imports System.Collections.Generic 
Imports Microsoft.SqlServer.Dts.Runtime 
Imports System.Linq 
Imports System.Text 
Imports System.Windows.Forms

Public Sub Main()
    Dim fileName As String = Dts.Variables("User::vFileName").Value.ToString()
    Dim ds As DataSet = Nothing
    Dim dt As DataTable = Nothing
    Dim outputFolder As String = Dts.Variables("User::vDestinationPath").Value.ToString()
    Dim fileMask As String = String.Empty
    Dim sheetName As String = String.Empty
    Dim outSubFolder As String = String.Empty
    Dim message As String = String.Empty
    Dim fireAgain As Boolean = True
    Try

        ds = New DataSet()
        dt = New DataTable()

        Dim adapter As New System.Data.OleDb.OleDbDataAdapter()
        adapter.Fill(dt, Dts.Variables("User::vResultSet").Value)

        Dim baseFileName As String = System.IO.Path.GetFileNameWithoutExtension(fileName)
        baseFileName = System.IO.Path.GetFileName(fileName)

        ds.Tables.Add(dt)
        'foreach (DataTable dt in ds.Tables)
        If True Then
            Persist(ds, fileName, outputFolder)
        End If
    Catch ex As Exception
        Dts.Events.FireInformation(0, "Data Dumper", String.Format("{0}|{1}", "fileName", fileName), String.Empty, 0, fireAgain)
        Dts.Events.FireInformation(0, "Data Dumper", String.Format("{0}|{1}", "outputFolder", outputFolder), String.Empty, 0, fireAgain)
        Dts.Events.FireInformation(0, "Data Dumper", String.Format("{0}|{1}", "ExceptionDetails", ex.ToString()), String.Empty, 0, fireAgain)
        Dts.Events.FireInformation(0, "Data Dumper", String.Format("{0}|{1}", "InnerExceptionDetails", ex.InnerException), String.Empty, 0, fireAgain)
    End Try

    Dts.TaskResult = CInt(ScriptResults.Success)
End Sub

Public Shared Sub Persist(ds As System.Data.DataSet, originalFileName As String, outputFolder As String, Optional delimiter As String = ",")

    ' Enumerate through all the tables in the dataset
    ' Save it out as sub versions of the 
    If ds Is Nothing Then
        Return
    End If

    Dim baseFileName As String = System.IO.Path.GetFileNameWithoutExtension(originalFileName)
    Dim baseFolder As String = System.IO.Path.GetDirectoryName(originalFileName)
    Dim header As System.Collections.Generic.List(Of String) = Nothing

    For Each table As System.Data.DataTable In ds.Tables
        Dim outFilePath As String = System.IO.Path.Combine(outputFolder, String.Format("{0}.csv", baseFileName, table.TableName))
        Dim e As System.Text.Encoding = System.Text.Encoding.[Default]

        If table.ExtendedProperties.ContainsKey("Unicode") AndAlso CBool(table.ExtendedProperties("Unicode")) Then
            e = System.Text.Encoding.Unicode
        End If

        Using file As New System.IO.StreamWriter(System.IO.File.Open(outFilePath, System.IO.FileMode.Create), e)
            table.ExtendedProperties.Add("Path", outFilePath)

            ' add header row
            header = New System.Collections.Generic.List(Of String)(table.Columns.Count)
            For Each item As System.Data.DataColumn In table.Columns
                header.Add(item.ColumnName)
            Next

            file.WriteLine(String.Join(delimiter, header))

            For Each row As System.Data.DataRow In table.Rows
                ' TODO: For string based fields, capture the max length
                Dim fields As IEnumerable(Of String) = (row.ItemArray).[Select](Function(field) field.ToString())

                file.WriteLine(String.Join(delimiter, fields))
            Next
        End Using
    Next
End Sub