I have around 65000 small xml files (around 1 to 3 kb per file) per hour in Azure blob storage. Its telemetry data and will be there in blob each hour. I want to combine them all and create big xml files(hourly one file or so) and convert that into big csv file.Then I want to copy it into in azure data lake store, so that I can further do analysis on that using U-SQL scripts. Please tell me what is the correct way to do this.
Right now we are using azure batch service which does this by keeping all file names in Azure table storage and each task in batch service will read data from table storage and update it once complete.
I think we can do better than this using web job sdk and Azure Service bus - topics and subscription. For each hour we can consider one topic and one subscriber.
Let me know that my thinking is correct or we can do this with some better technique.