0
votes

I'm looking for advice on what the best practice is with regards to process orchestration. To give some context I have the following tasks to orchestrate:

  1. Scale up Azure Batch Pool to provide adequate nodes
  2. Execute custom .Net code which calls a server to retrieve a list of tasks. These tasks change on a daily bases. Queue these tasks onto the Batch Pool.
  3. Execute each task (custom .Net code) on the Batch Pool. Each task creates data within an Azure storage account.
  4. Scale down the batch pool as it is no longer required.
  5. Start / scale up the Data Warehouse
  6. Bulk Import the data into Data Warehouse (expect to be using a combination of PolyBase and BCP).
  7. Aggregate the data and produce output to an Azure Storage account.
  8. Pause / scale down the Data Warehouse

I'm currently comparing Data Warehouse to Runbooks to perform the above.

I find Runbooks are very primitive in terms of their visualisation during design and run time.

I find that Data Warehouse is much more visually apealing. However, the data slicing seems massive overkill. I simply want the process to execute at say 8am each morning. I don't want it to attempt to excute for days past (if I amend the template for example). I expect Data Warehouse will handle failure/resume better along the pipeline of activites also.

Are there any other approaches I should consider here / recommendations?

Thanks David

1

1 Answers

0
votes

That's a fairly broad question so I'll offer a broad-ish answer...

Azure Data Factory (ADF) can certainly do most of what you need in the list above with a few exceptions/tweaks, as below.

The batch compute pool scaling. That would need to be handled locally in the service using the auto scale functionality and passing a command to deal with this. There isn't a activity in ADF to set that directly.

The custom .Net code you mention in points 3 and 4. You can write these as ADF custom activities that get passed to the batch service for execution. So allow ADF to handle these DLL's etc rather than having something else that creates the batch tasks and ADF just executes them. ADF will handle all of this.

More info on creating custom activities here: https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/

For the Data Warehouse ADF has out of the box functionality to execute your queries and allow the passing of parameters to stored procedure etc.

Lastly, for the DW scaling and pausing I think you'll need to use Azure Automation here. I'm not aware of anything in ADF that can offer that level of control, unless you break out the .Net again.

Hope this give you a steer on how to progress.