30
votes

I have a Data Flow Task that is hanging on excecution.
The flow is simple, makes two queries to different tables (Both with a couple of joins), then sorts and merges the otuputs through a common id, adds a static column to all the records, saves the row count in a user variable for later use and finally inserts into a table on another DB. We are using OLE DB Sources and Destination. Source is MSSQL 2000 and Destination is MSSQL 2012

Symptoms:

  • When excecuting, the Data Flow gets the usual yellow "running" icon. However when you double click to see the Data Flow, non of the elements have any yellow, red or green mark.
  • This goes on for long periods of time, at first it lasted around 20 minutes, after that it started getting longer or simply not returning at all.
  • Output shows:
    Information: 0x40043006 at Load Sandbox Table, SSIS.Pipeline: Prepare for Execute phase is beginning.
    Information: 0x40043007 at Load Sandbox Table, SSIS.Pipeline: Pre-Execute phase is beginning.

    And nothing more until the excecution is stopped.
  • Yes, this has worked before. And yes, we have used a single query (in a stored procedure) to do this ETL but we wanted to migrate all the steps to SSIS.
  • Failed solutions:

  • There are no lookups.
  • Default buffer size for the task flow was increased to 40485760 then to 80971520.
  • Default buffer max rows for task was set to 1000000.
  • Delay Validation was set to True for the task.
  • All elements inside the task were set Validate External Data to False.
  • Both queries had:
    SET FMTONLY OFF;
    SET NOCOUNT ON;

    added at the beggining.
  • Both queries had MAXDOP set to 1.
  • Setting project's Run 64 bit Runtime to False.
  • Changed destination load from Table or View to Table or View - Fast load with no locks or constraints.
  • Set rows per batch to 1000 for fast load.
  • Some work arounds propose to separate the task flow into two or more task flows. But this is not possible since what we need to do is a merge of the information found on both source queries.
  • Extra bits: I really hope someone can help me. I am fairly new to SSIS, this is the first time I use it. I usually work with Pentaho for my ETL but the client needs the solution to be implemented on SSIS. I've been battling with this issue for a couple of days now and I'm starting to run out of ideas to solve it.


    When ran through the command line it gets stuck too and I get the following output:

    Progress: 2013-03-19 14:36:26.21
       Source: Load Sandbox Table
       Validating: 0% complete
    End Progress
    Progress: 2013-03-19 14:36:26.21
       Source: Load Sandbox Table
       Validating: 12% complete
    End Progress
    Progress: 2013-03-19 14:36:26.22
       Source: Load Sandbox Table
       Validating: 25% complete
    End Progress
    Progress: 2013-03-19 14:36:26.22
       Source: Load Sandbox Table
       Validating: 37% complete
    End Progress
    Progress: 2013-03-19 14:36:26.23
       Source: Load Sandbox Table
       Validating: 50% complete
    End Progress
    Progress: 2013-03-19 14:36:26.25
       Source: Load Sandbox Table
       Validating: 62% complete
    End Progress
    Progress: 2013-03-19 14:36:26.25
       Source: Load Sandbox Table
       Validating: 75% complete
    End Progress
    Progress: 2013-03-19 14:36:26.25
       Source: Load Sandbox Table
       Validating: 87% complete
    End Progress
    Progress: 2013-03-19 14:36:26.25
       Source: Load Sandbox Table
       Validating: 100% complete
    End Progress
    Warning: 2013-03-19 14:36:26.26
       Code: 0x80047076
       Source: Load Sandbox Table SSIS.Pipeline
       Description: The output column "ITEM_OID (1)" (47) on output "Merge Join Outp
    ut" (28) and component "Merge Join" (11) is not subsequently used in the Data Fl
    ow task. Removing this unused output column can increase Data Flow task performa
    nce.
    End Warning
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 0% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 12% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 25% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 37% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 50% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 62% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 75% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 87% complete
    End Progress
    Progress: 2013-03-19 14:36:26.27
       Source: Load Sandbox Table
       Prepare for Execute: 100% complete
    End Progress
    Progress: 2013-03-19 14:36:26.31
       Source: Load Sandbox Table
       Pre-Execute: 0% complete
    End Progress
    Progress: 2013-03-19 14:36:26.31
       Source: Load Sandbox Table
       Pre-Execute: 12% complete
    End Progress
    Progress: 2013-03-19 14:36:26.31
       Source: Load Sandbox Table
       Pre-Execute: 25% complete
    End Progress
    Progress: 2013-03-19 14:36:26.34
       Source: Load Sandbox Table
       Pre-Execute: 37% complete
    End Progress
    Progress: 2013-03-19 14:36:45.69
       Source: Load Sandbox Table
       Pre-Execute: 50% complete
    End Progress
    

    After that it freezes again.

    SOLUTION (Posting this here because I can't answer my own question for another 5 hours, I'll do it when I'm allowed to.)
    I finally got it.
    It turns out there is an issue with the validation, but not only SSIS elements go through that validation, as stated in the fourth failed solution of the question.
    The CONNECTIONS also get validated and have their own Delay Validation property, which needs to be set to true.
    After that the excecution time went from 40+ mins or no running to less than a minute for the full process (This is just one step of a much bigger process)
    I hope people with this same problem can find this solution easily because there's a lot of people running into this problem and nearly no solutions posted online.

    In a nutshell: Check that all your elements involved in the task, including the DB connections have Delay Validation Property set to True.

    9
    If you don't run it from within the context of Visual Studio, what happens? From a command line, dtexec.exe /file C:\somepath\Package.dtsxbillinkc
    Thanks, I hadn't thought about that. It got stuck again, though the output seems pretty weird. The output is too long for the comments, I'll edit the question and add it there.Ryoku
    Can you post the output in all of its texty glory?billinkc
    Of course! I was editing the postRyoku
    Regarding your solution, I have never run into a situation where I found the need to set delay validation to true for database connections. Glad your situation has been cleared up though.billinkc

    9 Answers

    14
    votes

    I finally got it. It turns out there is an issue with the validation, but not only SSIS elements go through that validation, as stated in the fourth failed solution of the question. The CONNECTIONS also get validated and have their own Delay Validation property, which needs to be set to true. After that the excecution time went from 40+ mins or no running to less than a minute for the full process (This is just one step of a much bigger process) I hope people with this same problem can find this solution easily because there's a lot of people running into this problem and nearly no solutions posted online.

    In a nutshell: Check that all your elements involved in the task, including the DB connections have Delay Verification Property set to True.

    4
    votes

    I got the same symptoms but set delay validation to True on each components did not solved my problem.

    I solved it by changing the OLE DB method from table or view, to sql command.

    regards.

    4
    votes

    I know this is old, but i just found a link about this that may help. I personally am using a view to just export data to an external data base, and the data validation is taking an excessive amount of time validating the view.

    https://connect.microsoft.com/SQLServer/feedback/details/258901/ssis-views-as-data-source-very-poor-performance-or-ssis-hangs

    the important part of this is Microsoft's answer

    Posted by Microsoft on 4/28/2008 at 2:45 PM

    This is a know issue and the result of the current design.

    There are 2 ways to pull data from a view in OLE DB source:

    1. Use "Table or view" access method

    2. Use "SQL command" access method, and enter a query "select * from ***"

    A different execution plan is generated in the two approaches.

    The one used in the former is not as efficient as the latter.

    If you hit the performance issue when going with the first approach, you can switch to the second approach as a work around.

    We have also blogged this issue - >http://blogs.msdn.com/sqlperf/archive/2007/04/29/set-up-ole-db-source-to-read-from-view-efficiently.aspx.

    Since this is a 'By Design' item and we believe that there is a work around we will not be providing any change at this time. As a result we are closing the case associated with your submission. If you disagree, please feel free to re-submit.

    We appreciate your time, effort, and support of SSIS.

    3
    votes

    Fixed my problem by changing the Data access mode to SQL Command and pasting my view into the SQL command text in the OLE DB Source.

    2
    votes

    We already had our Delayed Validations set to True and couldnt/didnt want to change it to a SQL statement.
    I came across ValidateExternalMetadata in the dataflows which I changed to False and that seem to work like a champ.

    I checked OP's steps and he mentions they did that in Step 5

    1
    votes

    Another thing to try, apparently, is to check the "Use 32 bit runtime" checkbox--this is if you see the problem when running the package as a job in your DB server (which is 64-bit, and in my case at least, SQL Server 2008R2). Go to the job, right-click > Properties… > Steps > right-click your SSIS package step > Properties… > General > Execution options (tab) > Use 32 bit runtime.

    I was seeing this problem, but only once I deployed the package to the server (I had a logging provider enabled so I could see it get stuck after the "Pre-Execute" phase). It always ran fine in BIDS (and fine on another server, oddly…still not sure why that is).

    A thread here tipped me off to this solution that seems to work--though my problem shows up intermittently, so YMMV. There are other possible solutions in that thread as well.

    1
    votes

    Hope this helps someone. I was trying to use this OLE DB Source to execute a SP with a param. I didn't need it to return anything, so I left that part out. But it wouldn't let me, it yelled 'no column information was returned by the sql '. So configured a dummy sql statement in my SP, which I set to never true. But it never got that column as output and the job just hung on pre-execution phase. So I changed that test to always be true, it returned the column, and presto. I do nothing with the column, but I guess it's needed there.

    1
    votes

    This problem is still active with SQL Server 2012/2014.

    None of the solutions mentioned above helped. In fact, nothing changed delaying validation, changing the configuration of the OLD DB Destination or OLE DB Connection.

    Reading the thread from this link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/35a484c7-4850-4f86-b14a-5dfb50491ab2/long-duration-preexecute-phase?forum=sqlintegrationservices

    it is suggested that the problem is with the execution plan.

    This was true for my case and adding a condition 1=1 to my OLE DB Source configuration forced SQL server to generate a new execution plan that fixed the issue for me.

    0
    votes

    I ran into the same issue a few minutes ago and the suggestions above didn't work for me (delay validation = true seems to be the go to answer). We've recently discovered some issues with parameter sniffing and once I remedied that in my stored procedures, my package ran in < 1 min. Consider checking your stored procedures to see if this might be the cause.