I have data in 200+ tables in on-prem database which I need to move to Azure Data Lake. I am not happy about idea of having 200+ separate pipelines (or one pipeline with 200+ activities or some other partitioning of pipelines/activities of thereof)in Azure Data Factory one for each table. Are there better/more scalable ways to do this? in SSIS I would likely to create a template package and wrap it in a for-loop container to iterate through a list of tables to do the work or, to take it a step further, have a few lists and each package working on its own list independently for greater parallelism. Are there ways to do the same in data factory?
2
votes
1 Answers
2
votes
Really? SSIS is terrible as far as dynamic schema is concerned. As per this question, Azure Data Factory (ADF) is an ELT tool not an ETL tool and is fundamentally different in that way. However it wouldn't take that long to do this using the Copy Wizard.
Alternately, you will have to learn to use the .net SDK and/or Powershell with ADF to work with it more dynamically. You would in theory need input and output datasets for each table which could run under one pipeline.
Lastly, I don't think this is a great pattern (mainly because you need some IaaS to host it) but as a last resort, SSIS does now have integration with Azure Data Lake in the 2016 Feature Pack.