2
votes

I've written my first durable function and I want to know if it possible / good practice to use a foreach loop within an orchestrator function?

The first activity in the orchestration returns a list of project ids and I want to loop through the list and execute a series of activities for each project id, using a sub orchestration.

I've created a test function and it seems to work. The only behaviour I observed was that each time the orchestrator replays and when it reaches the foreach loop, it iterates through the whole list until it reaches the current item, then executes the activities.

Any advice / opinions would be appreciated.

Thanks

3
I've posted an answer but seeing your actual code (or a simplification) would be helpful :).Marc

3 Answers

2
votes

As long as your code in the orchestration is deterministic you are OK. More info on the code constraints in the docs.

You mention you retrieve these ID's using an activity function. As long as you call the functions/suborchestrations with the same arguments you should be fine because during replay Durable Functions recognizes that the function has been called before and will return the persisted output (and thus not re-execute the same function).

4
votes

Looking at your example this is very standard Fan-out/Fan-in case. You can run the loop on the activities in parallel but make sure you are doing it asynchronously. You can find use case and example here.

https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-concepts#fan-in-out

Based on comments

This is exactly how Orchestrater is supposed to work . Orchestration is using event sourcing pattern. When Orchestrator schedule the activity it goes to sleep and when activity is finished it wakes up. Every time orchestrator wakes up it will always replay from the start and will check the execution history to see if it has already completed the given activity and moves on. So in case of loop it will schedule all the activities and goes to sleep and when wakes up it replays back from start to see if it has completed its task. I highly recommend watching following clip from Jeff hollan from Microsoft and i am sure you will have very clear idea after that.

How Orchestration works

0
votes

A key concept for managing foreach in Durable functions, whether that be Function Chaining or Fan-in/Fan-out is that the data to iterate is returned from an Activity and that the processing of each data item is also performed within the context of an Activity.

This pattern will ensure that your logic is deterministic, don't rely on the NonDeterministicOrchestrationException as your proof that the logic is deterministic, that is commonly raised when a replay operation sends a different input than was expected and may not directly or initially inform you of non-deterministric logic.

Any call to the database or an external service or other http endpoints should be considered non-deterministic, so wrap code like that inside of an Activity. This way when the Orchestrator is replaying, it will retrieve the results of previous completed call to that activity form the underlying store.

  1. This can help improve performance if the logic in the activity is only evaluated once for the durable lifetime.
  2. This will also protect you from transient errors that may ocurr if during a replay attempt the underlying provider may be momentarily unavailable.

In the following simple example we have a rollover function that needs to be performed on many facilities, we can use fan-out to perform the individual tasks for each facility concurrently or chaining sequentially:

[FunctionName("RolloverBot")]
public static async Task<bool> RolloverBot(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    // example of how to get data to iterate against in a determinist paradigm
    var facilityIds = await context.CallActivityAsync<int[]>(nameof(GetAllFacilities), null);

    #region Fan-Out
    var tasks = new List<Task>();
    foreach (var facilityId in facilityIds)
    {
        tasks.Add(context.CallActivityAsync(nameof(RolloverFacility), facilityId));
    }
    
    // Fan back in ;)
    await Task.WhenAll(tasks);
    #endregion Fan-Out
  
    #region Chaining / Iterating Sequentially

    foreach (var facilityId in facilityIds)
    {
        await context.CallActivityAsync(nameof(RolloverFacility), facilityId);
    }

    #endregion Chaining / Iterating Sequentially

    return true;
}

/// <summary>
/// Return a list of all FacilityIds to operate on
/// </summary>
/// <param name="context"></param>
/// <returns></returns>
[FunctionName("GetAllFacilities")]
public static async Task<int[]> GetAllFacilities([ActivityTrigger] IDurableActivityContext context)
{
    var db = await Globals.GetDataContext();
    var data = await db.Facilities.AddQueryOption("$select", "Id").ExecuteAsync();
    return data.Where(x => x.Id.HasValue).Distinct().Select(x => x.Id.Value).ToArray();
}

[FunctionName("RolloverFacility")]
public static async Task<bool> RolloverFacility(
    [ActivityTrigger] IDurableActivityContext context)
{
    int facilityId = context.GetInput<int>();
    bool result = false;

    ... insert rollover logic here

    result = true;
    return result;
}

In this way, even if your Activity logic uses System.Random, Guid.CreateNew or DateTimeOffset.Now to determine the the facilityIds to return, the durable function itself is still considered Deterministic and will replay correctly.

As a rule I would still recommend passing through the IDurableOrchestrationContext.CurrentUtcDateTime from the orchestration function to the activity if your activity logic is time dependent as it makes the logic more obvious that the Orchestrator is actually controlling the tasks, and not the other way around, also there can be a miniscule lag time due to the function implementation between the scheduling of CallActivityAsync and its actual execution.