I am having a SSIS Merge Join challenge that i am failing to figure out how to approach and solve.
From the above Data Flow Task, Excel Source Sort 1 has columns CategoryName, Date, City while OLE DB Source Sort 2 has columns CategoryName, Date. At Merge join 1, i want to merge on 3 columns i.e CategoryName, Date, City but the problem is City does not exist in Sort 2.
The problem is that CategoryName & Date combined can still have duplicate values and the business rule is ok with that but as shown in screen shoot, 1,144 rows get out of the merger join as 1,300 rows which is not desired.
The only way i can get unique rows at Merge Join 1 is if i had the City column in Sort 2.
However in Sort 2, a rows city value can be determine by knowing its CategoryName and Date values.
So i was wondering if there might be away i can first match rows in Sort 1 with Rows in Sort 1 on CategoryName & Date columns which would give me the value of City in Sort 2 such that by the time i do a merge at Merge Join 1, both Sort 1 and Sort 2 have all three columns needed to uniquely identify a row.
All i am looking for is how to get only 1,144 rows from the Merge Join but not 1,300 rows as is currently the case.
An alternative way of achieving the same is also welcome.
All i am looking for is how to get only 1,144 rows from the Merge Join but not 1,300
Change Join Type of Merge Join 1 to LEFT or FULL. Other solution will be probably not to use Merge Join element and use lookup for OLE DB rows... it would be faster without blocking elements... – Justin