0
votes

I've got a loop that needs to be run in parallel as each iteration is slow and processor intensive but I also need to call an async method as part of each iteration in the loop.

I've seen questions on how to handle an async method in the loop but not a combination of async and synchronous, which is what I've got.

My (simplified) code is as follows - I know this won't work properly due to the async action being passed to foreach.

protected IDictionary<int, ReportData> GetReportData()
{
    var results = new ConcurrentDictionary<int, ReportData>();
      
    Parallel.ForEach(requestData, async data =>
    {
        // process data synchronously
        var processedData = ProcessData(data);

        // get some data async
        var reportRequest = await BuildRequestAsync(processedData);

        // synchronous building
        var report = reportRequest.BuildReport();

        results.TryAdd(data.ReportId, report);
     });

     // This needs to be populated before returning
     return results;
}

Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.

It's not a practical option to convert the synchronous functions to async.

I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.

I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach but have never used PLINQ and not sure if it would wait for all of the iterations to be completed before returning, i.e. would the Tasks still be running at the end of the process.

1
Does this answer your question? Parallel.ForEach and async-awaitFildor
The Parallel.ForEach is not async-friendly, and neither is PLINQ. AFAIK the ideal tool for processing mixed sync-async workloads is the TPL Dataflow library. You can see an example here. Bear in mind that the TPL Dataflow has a -smallish- learning curve. If you don't have time for that, you can just pack the ThreadPool with lots of threads, and process everything synchronously.Theodor Zoulias
Can you convert the async call to a synchronous one? using async/await is mostly for hiding latency of IO operations, when running in parallel that might not be usefull.JonasH
@JonasH -while it is true that the async is not useful here the same method is used elsewhere and the async is useful there; the method ultimately calls a library method that is async only so would not be simple to change it.Mog0

1 Answers

4
votes

Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.

Yes, but you'll need to understand what Parallel gives you that you lose when you take alternative approaches. Specifically, Parallel will automatically determine the appropriate number of threads and adjust based on usage.

It's not a practical option to convert the synchronous functions to async.

For CPU-bound methods, you shouldn't convert them.

I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.

The first recommendation I would make is to look into TPL Dataflow. It allows you to define a "pipeline" of sorts that keeps the data flowing through while limiting the concurrency at each stage.

I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach

No. PLINQ is very similar to Parallel in how they work. There's a few differences over how aggressive they are at CPU utilization, and some API differences - e.g., if you have a collection of results coming out the end, PLINQ is usually cleaner than Parallel - but at a high-level view they're very similar. Both only work on synchronous code.

However, you could use a simple Task.Run with Task.WhenAll as such:

protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
  var tasks = requestData.Select(async data => Task.Run(() =>
  {
    // process data synchronously
    var processedData = ProcessData(data);

    // get some data async
    var reportRequest = await BuildRequestAsync(processedData);

    // synchronous building
    var report = reportRequest.BuildReport();

    return (Key: data.ReportId, Value: report);
  })).ToList();
  var results = await Task.WhenAll(tasks);
  return results.ToDictionary(x => x.Key, x => x.Value);
}

You may need to apply a concurrency limit (which Parallel would have done for you). In the asynchronous world, this would look like:

protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
  var throttle = new SemaphoreSlim(10);
  var tasks = requestData.Select(data => Task.Run(async () =>
  {
    await throttle.WaitAsync();
    try
    {
      // process data synchronously
      var processedData = ProcessData(data);

      // get some data async
      var reportRequest = await BuildRequestAsync(processedData);

      // synchronous building
      var report = reportRequest.BuildReport();

      return (Key: data.ReportId, Value: report);
    }
    finally
    {
      throttle.Release();
    }
  })).ToList();
  var results = await Task.WhenAll(tasks);
  return results.ToDictionary(x => x.Key, x => x.Value);
}