I'm new to Azure Data Factory and I'm working on a proof of concept for my organisation, I'm finding it hard to get good information on fairly basic things and I'm hoping someone can point me to some good resources for my use case.
I know this question is quite general, but any help would be useful. I'm going around in circles at the moment and I feel like I'm wasting a lot of time. Something that would take me a few minutes in ssis has taken hours of research so far and I still haven't progressed much.
Here's the use case:
- A gzip archive arrives in blob storage every hour, it has several .tsv files in it, but I want to extract one, which has web click stream data.
- I want to extract this one .tsv file from the archive, append the datetime to the name and then save it to Azure data lake storage.
- I want this to happen each time a new gzip archive arrives.
So far I have:
- Azure Data Factory V2 setup
- Linked Service setup to the blob container
- Linked Service setup to data lake store Gen1
- I think all the permissions and firewall issues sorted for ADF to access storage.
Is Azure Data Factory the right tool for this job? and if so, where do I go from here? How do I build datasets and a pipeline to achieve the use case and how do I schedule this to run when a new zip arrives?