I am building a staging area that gets data from informatica cdc. Now for example lets say I am replication two tables for incremental load. I have to delete the processed data from the staging tables after each load. I join these two tables to populate my target dimension. Problem is change can happen on only one source and not the other in a particular load.
Example:
Employee
---------
ID NAME
1 PETER
EmployeeSal
------------
EMPID SAL
1 2000
If the above is replicated in my first load, I join the two tables and load them thats fine.
Now lets say the salary of peter is updated frrom 2000 to 3000. As I have delete my staging tables after each load, I will have the following for current load.
Employee
---------
ID NAME
EmployeeSal
-----------
EMPID SAL
1 3000
Here is my problem ss I have to populate the whole row of the dimension which is TYPE2.
I have to join back to the source to get the other attributes of employee table ( This is just a lame example, In reality it might be 10 tables and hundreds of thousand of changes). Is it recommended to go back to the source?
I can join the target table to this mix and populate the missing attributes.
Is this even recommended as I have to do lot of case statements , nullhandlings etc if a particular staging table has no change for a dimension record. My question is is it even common that target table is joined in a ETL transformation?