2
votes

I need to make a lookup between two tables T1(A,B,C) and T2(A,B,C,D,E) on column C to get all column B values that are matching :

T1 :

enter image description here

T2 :

enter image description here

When I choose Full cache Mode I get only the first matching row (I'm only interested by column B values): 12122 but I need to get also 12123 and 12124 because C matches also with these rows.

I've tried to use Partial and no cache modes by using custom query with inner join (which returns all needed rows when executing the query in SSMS) but doesn't return all rows and it's killing performence.

I've tried also the solution proposed here :

How to get unmatched data between two sources in SSIS Data Flow?

And it gives the same results as lookup plus, I need to redirect unmatched rows to new table.

2

2 Answers

0
votes

I don't think the cache mode will affect your result, and it is performance based. The ultimate explanation is:

•If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query. If multiple matches are found, the Lookup transformation generates an error or warning only when the transformation has been configured to load all the reference dataset into the cache. In this case, the Lookup transformation generates a warning when the transformation detects multiple matches as the transformation fills the cache.

To get the matched B from T2, you can just use the SQL in OLD DB source(command), for example:

SELECT distinct T2.B 
FROM T1 as A 
INNER JOIN T2 as B
ON B.C = A.C
0
votes

If LONG's answer does not address your needs, you'd need to write a Script Transformation, operating in asynchronous mode (1 row of input can yield 0 to many output rows)

If the source data/T1 wouldn't contain duplicate C values, then the pre-execute phase of the component could cache the result of column B & C from T2 into local memory. Then for each source row that flows through, you'd need to loop through the results and append the B values into the data flow.

This gets trickier if T1 can have duplicate-ish data as you'd need to be querying the target table for each row that flows through - but you'd also have to track the B/C values that have already rolled through as you might need to reference those Bs as well.

You can also evaluate a Merge Join as I think that allows multiple rows to be emitted but I'm guessing you'll have more control over performance with a script transformation.

Either way, when you pull T2 table in, write a custom query and only select the columns you need (B&C).