0
votes

I have two Azure Function apps, one is created in C# and one is created in Python. The C# app picks up json files via API calls and drops those files into an Azure Blob Storage container. Later, the Python Azure Function is activated via a timer trigger for ETL, where it picks up the files from blob storage, manipulates the data in Python pandas, and then pushes the data to an Azure SQL database.

I'm wondering if there is a more efficient way to complete this process using Durable Functions. Is it possible to orchestrate the the C# application to communicate with the Python application using Durable Functions so the files are processed in near real-time instead of relying on a timer trigger? Also, is this approach sustainable long term as my user base grows or should I look into other solutions like Azure Batch, Logic Apps, or Azure Service Bus?

2

2 Answers

1
votes

I think your solution is pretty solid, however IMO long pooling with a timer trigger may not be the most efficient way. I don't know the details of your solution/implementation, but i would consider using Queues instead (there are some options in azure), and you can implement an Azure function with trigger based on the queue's message receival.

In my opinion, the Queue native handling of messages (FIFO message delivery) can help you improving performance later on, so if needed, you can use more instances to process the data or even gain more flexibility of implementation. Also this should give you a more 'realtime' solution without the need to 'hit and check your blob' every X minutes.

1
votes

Yes you could use the Function chaining pattern in durable functions to do this since it only relies on the functions being within the same function app and uses the function names.

However, you can achieve your end goal without the use of Durable functions by setting up your second function (which is currently triggered by a timer) by using an Event Grid trigger instead.

Azure Event Grid allows you to easily build applications with event-based architectures. First, select the Azure resource you would like to subscribe to, and then give the event handler or WebHook endpoint to send the event to. Event Grid has built-in support for events coming from Azure services, like storage blobs and resource groups. Event Grid also has support for your own events, using custom topics.

Yes you could use the Blob Storage trigger but these do not scale and use a polling mechanism behind the scenes, as explained in this video, where as the Events (from Event Grid) use a push mechanism.

  • Check out the video description for sample source code.

From the official docs:

The Event Grid trigger also has built-in support for blob events. Use Event Grid instead of the Blob storage trigger for the following scenarios:

Blob-only storage accounts: Blob-only storage accounts are supported for blob input and output bindings but not for blob triggers.

High-scale: High scale can be loosely defined as containers that have more than 100,000 blobs in them or storage accounts that have more than 100 blob updates per second.

Minimizing latency: If your function app is on the Consumption plan, there can be up to a 10-minute delay in processing new blobs if a function app has gone idle. To avoid this latency, you can switch to an App Service plan with Always On enabled. You can also use an Event Grid trigger with your Blob storage account. For an example, see the Event Grid tutorial.

Essentially you need to configure the event grid subscription against your storage account - there is a guide here, you can filter to certain containers and virtual directories using the syntax outlined here:

To match events from blobs created in specific container sharing a blob name prefix, use a subjectBeginsWith filter like:

/blobServices/default/containers/containername/blobs/blobprefix

I also found this post really helpful when setting this up.