2
votes

I'm working on a SSIS Package with a Merge Join in it.

I load from two differend DB's, convert Data from one source and let it flow into a Merge Join Task.

After the Merge Join Task I split the Data with a Conditional Split into New, Deleted and Existing Records.

After the Split i flow down to a insert Task and two OLE DB Commands wich Update and delete obsolete and update changed Records.

When the Merge Join has the option inner join, it blocks as long as it needs to load all Data from both DBs.

When the Merge Join has the option full outer join OR left outer join it does not block till all data is loaded from both DBs.

I need the full outer join option on the merge join task to split the data up in new and delted records afterwards. If I would use a inner join I could just process existing and updated data.

When the Merge Join isn't blocking I get an error within the Transaction because when the load still is processing and it tries to insert or update records the Transaction fails.

My Questions at all are, if someone can explain me why it is like this, that the merge join wont block on outer joins and if someone already faced such a problem and how he solved it ?

Thanks.

1

1 Answers

6
votes

The easiest way to avoid the transaction-blocking error is with a staging table for your output data rather than a conditional split. You get the best of both worlds here:

  • Maximally synchronous data flow through your merge join into your staging table
  • Avoidance of those pesky one-by-one OLE DB Commands, instead replacing them with a nice batch UPDATE JOIN / DELETE JOIN statements connecting your staging table back to your source.

Also, you are incorrect, both the inner join and the outer joins are only partially "blocking" i.e. asynchronous, but when you have an outer join (and especially when you have a full outer join) more rows are passed to the output buffer more quickly since the criteria is less restrictive.

Imagine you have two piles of 101 cards each printed with various Breaking Bad plot holes on them sorted in order of increasing incredulity and an outbox for "matched" cards. Since they're flimsy little cards, when you sorted them, you stapled them all together and trying to remove the staples will cause them to scatter uselessly about the floor like so many broken meth kingpin dreams.

Let's also say that every time you "match" 10 cards (or when you have no cards left to process), a little green gnome comes in and takes those cards out for further processing somewhere else. Now let's add a twist: there is only one match in the pile, and it's the very last card (why did he leave Gray Matter in the first place?).

At the moment before you process those last two cards, how many little green gnomes will have come to see you?

  • Inner Join - 0. No matches yet!
  • Left Outer join - 10. All the other cards in pile A went on (100), so you've processed 10 sets of 10 cards.
  • Full Outer join - 20. All the other cards from pile A and pile B (200) went on. That's 20 sets of 10 cards.

And more importantly to your problem, in an inner join, the first little green gnome dutifully waits until you're done with the pile before taking his one lowly match somewhere else. With a full outer join, the first little green gnome came after the 5th comparison ... oh no! It turns out part of the further processing is he needs to change up some of the cards in your pile. (There's always more plot holes to be found if you look hard enough ...) But the staples! You tell him no, there's a scuffle, charges are pressed, now you're doing 3-to-5 at San Quentin with a guy who reminds you a little too much of Skinny Pete.

But if instead you told the little green gnomes, "Put the new cards over there and we'll get all those changes in in one fell swoop later," everything goes much smoother.

So use a staging table. No more blocking, faster update, problems solved. Well, except those Breaking Bad plot holes. They'll last forever. Oh well!