0
votes

I have a scenario where one activity function has retrieved a set of records which can be anywhere from 1000 to a million and stored in an object. This object is then used by the next activity function to send messages in parallel to service bus.

Currently I am using a for loop on this object to send each record in the object to service bus. Please let me know if there is a better alternative pattern where the object or content (wherever it is stored) is emptied to be sent to service bus and the function scales out automatically without restricting the processing to a for loop.

  • Have used a for loop from a function that orchestrates to call activity functions for the records in the object.
  • Have looked at the scaling of the activity function and for a set of 18000 records it has scaled up-to 15 instances and processed the whole set in 4 minutes.
  • Currently the function is using the consumption plan.Checked to see that only this function app is using this plan and its not shared.
  • The topic to which the message is sent has another service listening to it, to read the message.
  • The instance count for both orchestrating & activity function is as available by default.
    for(int i=0;i<number_messages;i++)
    {
       taskList[i] = 
    context.CallActivityAsync<string>("Sendtoservicebus", 
       (messages[i],runId,CorrelationId,Code));
     }

    try
     {
      await Task.WhenAll(taskList);
     }
    catch (AggregateException ae)
     {
      ae.Flatten();
     }

The messages should be quickly sent to service bus by scaling out the activity functions appropriately.

1

1 Answers

1
votes

I would suggest you to use Batch for sending messages.

Azure Service Bus client supports sending messages in batches (SendBatch and SendBatchAsync methods of QueueClient and TopicClient). However, the size of a single batch must stay below 256k bytes, otherwise the whole batch will get rejected.

We will start with a simple use case: the size of each message is known to us. It's defined by hypothetical Func getSize function. Here is a helpful extension method that will split an arbitrary collection based on a metric function and maximum chunk size:

public static List<List<T>> ChunkBy<T>(this IEnumerable<T> source, Func<T, long> metric, long maxChunkSize)
{
    return source
        .Aggregate(
            new
            {
                Sum = 0L,
                Current = (List<T>)null,
                Result = new List<List<T>>()
            },
            (agg, item) =>
            {
                var value = metric(item);
                if (agg.Current == null || agg.Sum + value > maxChunkSize)
                {
                    var current = new List<T> { item };
                    agg.Result.Add(current);
                    return new { Sum = value, Current = current, agg.Result };
                }

                agg.Current.Add(item);
                return new { Sum = agg.Sum + value, agg.Current, agg.Result };
            })
        .Result;
}

Now, the implementation of SendBigBatchAsync is simple:

public async Task SendBigBatchAsync(IEnumerable<T> messages, Func<T, long> getSize)
{
    var chunks = messages.ChunkBy(getSize, MaxServiceBusMessage);
    foreach (var chunk in chunks)
    {
        var brokeredMessages = chunk.Select(m => new BrokeredMessage(m));
        await client.SendBatchAsync(brokeredMessages);
    }
}

private const long MaxServiceBusMessage = 256000;
private readonly QueueClient client;

how do we determine the size of each message? How do we implement getSize function?

BrokeredMessage class exposes Size property, so it might be tempting to rewrite our method the following way:

public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
{
    var brokeredMessages = messages.Select(m => new BrokeredMessage(m));
    var chunks = brokeredMessages.ChunkBy(bm => bm.Size, MaxServiceBusMessage);
    foreach (var chunk in chunks)
    {
        await client.SendBatchAsync(chunk);
    }
}

The last possibility that I want to consider is actually allow yourself violating the max size of the batch, but then handle the exception, retry the send operation and adjust future calculations based on actual measured size of the failed messages. The size is known after trying to SendBatch, even if operation failed, so we can use this information.

// Sender is reused across requests
public class BatchSender
{
    private readonly QueueClient queueClient;
    private long batchSizeLimit = 262000;
    private long headerSizeEstimate = 54; // start with the smallest header possible

    public BatchSender(QueueClient queueClient)
    {
        this.queueClient = queueClient;
    }

    public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
    {
        var packets = (from m in messages
                     let bm = new BrokeredMessage(m)
                     select new { Source = m, Brokered = bm, BodySize = bm.Size }).ToList();
        var chunks = packets.ChunkBy(p => this.headerSizeEstimate + p.Brokered.Size, this.batchSizeLimit);
        foreach (var chunk in chunks)
        {
            try
            {
                await this.queueClient.SendBatchAsync(chunk.Select(p => p.Brokered));
            }
            catch (MessageSizeExceededException)
            {
                var maxHeader = packets.Max(p => p.Brokered.Size - p.BodySize);
                if (maxHeader > this.headerSizeEstimate)
                {
                    // If failed messages had bigger headers, remember this header size 
                    // as max observed and use it in future calculations
                    this.headerSizeEstimate = maxHeader;
                }
                else
                {
                    // Reduce max batch size to 95% of current value
                    this.batchSizeLimit = (long)(this.batchSizeLimit * .95);
                }

                // Re-send the failed chunk
                await this.SendBigBatchAsync(packets.Select(p => p.Source));
            }

        }
    }
}

You can use this blog for further reference. Hope it helps.