I have an Azure durable function project where I'm working with relatively big CSV files. I have two trigger functions:
One trigger function needs to get a file from Azure Blob Storage, where that file is over 100MB, and It needs to make smaller chunks and put them in Azure Blob Storage (on different blob directory)
This is the trigger orchestration function
[FunctionName(nameof(FileChunkerOrchestration))]
public async Task Run(
[OrchestrationTrigger] IDurableOrchestrationContext context,
ILogger log)
{
var blobName = context.GetInput<string>();
var chunkCounter = await context.CallActivityAsync<int>(nameof(FileChunkerActivity), blobName);
}
This is the activity function
[StorageAccount("AzureWebJobsStorage")]
[FunctionName(nameof(FileChunkerActivity))]
public async Task<int> Run(
[ActivityTrigger]IDurableActivityContext context,
string fileName,
[Blob("vessel-container/csv-files/{fileName}.csv", FileAccess.Read)]TextReader blob,
[Blob("vessel-container/csv-chunks", FileAccess.Write)] CloudBlobContainer container)
{
// Uses TextReader blob to create chunk files
// Then stores chunk by chunk (as soon as one chunk is created then it's being uploaded to Blob storage) in CloudBlobContainer container
}
- The second trigger function gets triggered on each and every chunks that have been created
[StorageAccount("AzureWebJobsStorage")]
[FunctionName(nameof(FileTrigger))]
public async Task Run(
[DurableClient] IDurableClient starter,
[BlobTrigger("vessel-container/csv-chunks/{name}.csv")] Stream blob,
string name,
ILogger log)
{
// Here the processing chunk files start
}
The problem I'm having is that the second function triggers every and each CSV chunk files and starts to run in parallel which leads my project to use too many available RAM.
I need to fix this in such a manner that my second function (it's orchestration) needs to process file by file.
Please share any idea how to overcome this problem, thanks in advance.
await Task.WhenAll()
on each group in succession. – juunas