15
votes

I'm currently writing an SSIS package that retrieves data from a stored procedure via an OLE DB Source. The stored procedure contains a rather nasty query that I've been able to improve with the use of temp tables. If I switch these temp tables to table variables, the logical reads jump from about 1.3 million to about 56 million. I'm uncomfortable enough with the 1.3 million, but there is no way that I can be satisfied with the 56 million logical reads. Thus, I can't really convert the temp tables to table variables.

However, SSIS (or rather SQL Server) can't resolve the metadata for this query, so the package won't run. I've found a few different solutions online, but none of them seem to work for both SQL Server 2008 and SQL Server 2014. We are currently in the process of upgrading all of our servers to 2014, and this particular package runs against 2008 in DEV, 2014 in QA, and 2008 in production currently. By the fall, the PROD tier will be 2014, and the DEV tier will be promoted sometime after that. Unfortunately, I can't wait until these upgrades happen to release this SSIS package. The data needs to start moving by next week. Thus, I need to figure out a way to get the metadata resolved for both environments. Here's what I've tried so far:

  1. Add a dummy select in an IF 1=0 block which returns the proper metadata. This works in 2008, but not 2014.

  2. Use SET FMTONLY OFF at the beginning of the stored procedure. This works in 2008, but not 2014. Furthermore, it causes the stored procedure to run once for each column returned (over 30 in this case), which is a deal-breaker even if it did work.

  3. Use EXEC ... WITH RESULT SETS (( ... ));. This works in 2014, but not in 2008.

  4. Deploy a stored procedure which returns the proper metadata, build and deploy the SSIS package, then modify the stored procedure to the proper version. This hasn't seemed to work in either environment, and this would complicate of any other ETL applications developed within our ETL framework.

If I can't figure anything out, I could either deploy different stored procedures and packages to the different tiers, but I would very much prefer against this. For one, this would complicate future releases, and I would also need to ensure that I don't forget about updating the stored procedure and package once we upgrade the servers.

I could also make real tables in the database which would take the place of these temp tables. I don't really like this solution, but it's something that I could tolerate. If I end up doing this, I would probably switch to using the WITH RESULT SETS in the future.

However, I personally don't care much for either of these solutions, so I was wondering if there is any workaround that I missed that might work a bit better.

1
Excellent first post where you've clearly done your research. Additional research for you on the temp table versus table variablebillinkc
Have you tried a combination of 1 & 3? A single stored proc that has BOTH With Result Sets... AND IF 1=0.. ?Tab Alleman
You've hit the nail on the head about about the breaking change between 2008 and 2012 on how SSIS handles temporary tables. Refresh my memory though, why would the SSIS package change? It'd still be calling dbo.MyProc whether the invoker was 2008 or 2014. It's just the underlying procedure that would need to be corrected for the environment, yeah?billinkc
How about using a wrapper stored proc? On both SSIS Servers, the same SSIS code calls the same proc name. On 2014, that proc uses WITH RESULT SETS to call a child proc that does the real work. On 2008, the wrapper proc uses IF 1=0 and calls the same child proc (or the same proc code but on 2008).Tab Alleman
Have you tried a delayed validation setting to true for the package so it doesn't look for the meta data until it runs that step?Joseph Gagliardo

1 Answers

2
votes

Despite your reluctance, I think you've made the right choice and a dedicated staging area is the right way to go. Most of the production ETLs I've worked with have a dedicated staging database, never mind tables. You then have the benefit of being able to control the storage more explicitly, which makes performance more reliable and the whole thing generally more maintainable. For example, you can create a dedicated contiguous block of fast disk space for these tables with their own file group etc. I'd certainly rather see 2 separate SPs relying on a few physical tables than a really gnarly single one.

That said, without knowing any specifics this is just my experience, so a caveat for future readers: As with all things database, be sure to measure the actual performance of your scenario (before and after) rather than making any assumptions based on the query plan - it might be misleading you.