2
votes

I have been using the LookUp Transformation in a SSIS package. The Input table has about 75 million rows, while the lookup table has about 70 million rows.

I need to capture both the matching and non-matching outputs, so that they can be processed as necessary.

I am running into a roadblock with the error "The attempt to add a row to data flow task buffer failed"

Inner join is a good alternative, however i don't think it can give two outputs i.e. matching and non-matching (which is why I did not use it in the first place. If that is not the case, please enlighten me!)

Partial Cache and No Cache are alternatives, however they are slower than the hour hand on a clock!

Can you please suggest any alternatives to the LookUp transformation that are able to give both matched and unmatched outputs?

Thanks in advance!

[Source_Table]
SELECT prop_code        --[varchar](6) NULL
    ,conf_nbr           --[varchar](20) NULL
    ,arrival_date       --[date] NULL
    ,system_source      --[varchar](5) NULL
    ,net_revenue        --[float] NULL
    ,net_room_nights    --[int] NULL
    ,srp_code           --[varchar](10) NULL
    ,corp_client_id     --[varchar](10) NULL
    ,rac_code           --[varchar](10) NULL
    ,ta_client_id       --[varchar](10) NULL
FROM Account_360_Stage_Prd_Reservations_CRS
ORDER BY prop_code
    ,conf_nbr

[Reference_Table]
SELECT DISTINCT property_code                                                   --[varchar](6) NOT NULL
    ,CAST(host_confirmation_number AS VARCHAR) AS 'host_confirmation_number'    --[bigint] NULL
FROM Account_360_Stage_Guest
ORDER BY property_code
    ,host_confirmation_number

[LookUp]
prop_code = property_code
conf_nbr  = host_confirmation_number
2
If you don't need all the columns from the Lookup table, then write a query to pull back exactly what is needed. 10M might sound like a lot but if you're only pulling back an ID and a match value, it will take up very little memory space and provide fine performance. The error you've provided should have been accompanied by more errors and/or warnings/informational events.billinkc
I am not using Select *. I always select columns of interest. The VM currently has 4GB allocated and we are planning on bumping it up to 8 GB, which will solve the problem for now. However, i am looking to provide a far sighted solution...AeyJey
Excellent, many users simply choose the table in the drop down and never look back. How about the depth of your result set? Is it possible to filter those 10M rows to just current year or some other segment of information. Do you know that the lookup is the only thing that is causing issues?billinkc
Thank you. Filtering by year may be an option, but then i don't see that as the ultimate solution, because whether i do the lookup in segments or all at once, the buffer failed error will come up eventually. Yes, i've been 'detail' logging the package for the past four days and after eliminating all possibilities, i've come to conclude that the DFT that contains this LookUp is the point of failure.AeyJey

2 Answers

4
votes

You could use a LEFT OUTER JOIN as this will give you both 'matched' and 'unmatched' . Then redirect all the NULLs from joined table as 'unmatched'

2
votes

I would probably use an inner join, but with a case statement to identify the two conditions. Then use a conditional split task to split the data based on the output of the case statement