Usecase
We have an on-premise Hadoop setup and we are using power BI as a BI visualization tool. What we do currently to get data on Powerbi is as follows.
- Copy data from on-premise to Azure Blob(Our on-premise schedule does this once the data is ready in Hive)
- Data from Azure Blob is then copied to Azure-DataWarehouse/Azure-SQL
- Cube refreshes on Azure AAS, AAS pulls data from Azure DataWarehouse/SQL
To do the step2 and step3 we are currently running a web server on Azure and the endpoints are configured to take few parameters like the table name, azure file location, cube information and so on.
Sample http request:
Here the web servers extract the values from variables(from, fromloc, to, totable) and them does the copy activity. We did this as we had a lot of tables and all could reuse the same function.
Now we have use cases piling up(retries, control flows, email alerts, monitoring) and we are looking for a cloud alternative to do the scheduling job for us, we would still like to hit an HTTP endpoint like the above one.
One of the alternatives we have checked till now is the Azure Data Factory, where are create pipelines to achieve the steps above and trigger the ADF using http endpoints.
Problems
- How can we take parameters from the http post call and make it available as custom variables[1], this is required within the pipeline so that we can still write a function for each step{2, 3} and the function can take these parameters, we don't want to create an ADF for each table.
- How can we detect for failure in ADF steps and send email alerts during failures?
- What are the other options apart from ADF to do this in Azure?
[1] https://docs.microsoft.com/en-us/azure/data-factory/control-flow-system-variables