23
votes

Too long to read. Using Task.ConfigureAwait(continueOnCapturedContext: false) may be introducing redundant thread switching. I'm looking for a consistent solution to that.

Long version. The major design goal behind ConfigureAwait(false) is to reduce redundant SynchronizationContext.Post continuation callbacks for await, where possible. This usually means less thread switching and less work on the UI threads. However, it isn't always how it works.

For example, there is a 3rd party library implementing SomeAsyncApi API. Note that ConfigureAwait(false) is not used anywhere in this library, for some reason:

// some library, SomeClass class
public static async Task<int> SomeAsyncApi()
{
    TaskExt.Log("X1");

    // await Task.Delay(1000) without ConfigureAwait(false);
    // WithCompletionLog only shows the actual Task.Delay completion thread
    // and doesn't change the awaiter behavior

    await Task.Delay(1000).WithCompletionLog(step: "X1.5");

    TaskExt.Log("X2");

    return 42;
}

// logging helpers
public static partial class TaskExt
{
    public static void Log(string step)
    {
        Debug.WriteLine(new { step, thread = Environment.CurrentManagedThreadId });
    }

    public static Task WithCompletionLog(this Task anteTask, string step)
    {
        return anteTask.ContinueWith(
            _ => Log(step),
            CancellationToken.None,
            TaskContinuationOptions.ExecuteSynchronously,
            TaskScheduler.Default);
    }
}

Now, let's say there's some client code running on a WinForms UI thread and using SomeAsyncApi:

// another library, AnotherClass class
public static async Task MethodAsync()
{
    TaskExt.Log("B1");
    await SomeClass.SomeAsyncApi().ConfigureAwait(false);
    TaskExt.Log("B2");
}

// ... 
// a WinFroms app
private async void Form1_Load(object sender, EventArgs e)
{
    TaskExt.Log("A1");
    await AnotherClass.MethodAsync();
    TaskExt.Log("A2");
}

The output:

{ step = A1, thread = 9 }
{ step = B1, thread = 9 }
{ step = X1, thread = 9 }
{ step = X1.5, thread = 11 }
{ step = X2, thread = 9 }
{ step = B2, thread = 11 }
{ step = A2, thread = 9 }

Here, the logical execution flow goes through 4 thread switches. 2 of them are redundant and caused by SomeAsyncApi().ConfigureAwait(false). It happens because ConfigureAwait(false) pushes the continuation to ThreadPool from a thread with synchronization context (in this case, the UI thread).

In this particular case, MethodAsync is better off without ConfigureAwait(false). Then it only takes 2 thread switches vs 4:

{ step = A1, thread = 9 }
{ step = B1, thread = 9 }
{ step = X1, thread = 9 }
{ step = X1.5, thread = 11 }
{ step = X2, thread = 9 }
{ step = B2, thread = 9 }
{ step = A2, thread = 9 }

However, the author of MethodAsync uses ConfigureAwait(false) with all good intentions and following the best practices, and she knows nothing about internal implementation of SomeAsyncApi. It wouldn't be a problem if ConfigureAwait(false) was used "all the way" (i.e., inside SomeAsyncApi too), but that's beyond her control.

That's how it goes with WindowsFormsSynchronizationContext (or DispatcherSynchronizationContext), where we might be not caring about extra thread switches at all. However, a similar situation could happen in ASP.NET, where AspNetSynchronizationContext.Post essentially does this:

Task newTask = _lastScheduledTask.ContinueWith(_ => SafeWrapCallback(action));
_lastScheduledTask = newTask;

The whole thing may look as a contrived issue, but I did see a lot of production code like this, both client-side and server-side. Another questionable pattern I came across: await TaskCompletionSource.Task.ConfigureAwait(false) with SetResult being called on the same synchronization context as that captured for the former await. Again, the continuation was redundantly pushed to ThreadPool. The reasoning behind this pattern was that "it helps to avoid deadlocks".

The question: In the light of the described behavior of ConfigureAwait(false), I'm looking for an elegant way of using async/await while still minimizing redundant thread/context switching. Ideally, something that would work existing 3rd party libraries.

What I've looked at, so far:

  • Offloading an async lambda with Task.Run is not ideal as it introduces at least one extra thread switch (although it can potentially save many others):

    await Task.Run(() => SomeAsyncApi()).ConfigureAwait(false);
    
  • One other hackish solution might be to temporarily remove synchronization context from the current thread, so it won't be captured by any subsequent awaits in the inner chain of calls (I previously mentioned it here):

    async Task MethodAsync()
    {
        TaskExt.Log("B1");
        await TaskExt.WithNoContext(() => SomeAsyncApi()).ConfigureAwait(false);
        TaskExt.Log("B2");
    }
    
    { step = A1, thread = 8 }
    { step = B1, thread = 8 }
    { step = X1, thread = 8 }
    { step = X1.5, thread = 10 }
    { step = X2, thread = 10 }
    { step = B2, thread = 10 }
    { step = A2, thread = 8 }
    
    public static Task<TResult> WithNoContext<TResult>(Func<Task<TResult>> func)
    {
        Task<TResult> task;
        var sc = SynchronizationContext.Current;
        try
        {
            SynchronizationContext.SetSynchronizationContext(null);
            // do not await the task here, so the SC is restored right after
            // the execution point hits the first await inside func
            task = func();
        }
        finally
        {
            SynchronizationContext.SetSynchronizationContext(sc);
        }
        return task;
    }
    

    This works, but I don't like the fact that it tampers with the thread's current synchronization context, albeit for a very short scope. Moreover, there's another implication here: in the absence of SynchronizationContext on the current thread, an ambient TaskScheduler.Current will be used for await continuations. To account for this, WithNoContext could possibly be altered like below, which would make this hack even more exotic:

    // task = func();
    var task2 = new Task<Task<TResult>>(() => func());
    task2.RunSynchronously(TaskScheduler.Default); 
    task = task2.Unwrap();
    

I'd appreciate any other ideas.

Updated, to address @i3arnon's comment:

I would say that it's the other way around because as Stephen said in his answer "The purpose of ConfigureAwait(false) is not to induce a thread switch (if necessary), but rather to prevent too much code running on a particular special context." which you disagree with and is the root of your compliant.

As your answer has been edited, here is your statement I disagreed with, for clarity:

ConfigureAwait(false) goal is to reduce, as much as possible, the work the "special" (e.g. UI) threads need to process in spite of the thread switches it requires.

I also disagree with your current version of that statement. I'll refer you to the primary source, Stephen Toub's blog post:

Avoid Unnecessary Marshaling

If at all possible, make sure the async implementation you’re calling doesn’t need the blocked thread in order to complete the operation (that way, you can just use normal blocking mechanisms to wait synchronously for the asynchronous work to complete elsewhere). In the case of async/await, this typically means making sure that any awaits inside of the asynchronous implementation you’re calling are using ConfigureAwait(false) on all await points; this will prevent the await from trying to marshal back to the current SynchronizationContext. As a library implementer, it’s a best practice to always use ConfigureAwait(false) on all of your awaits, unless you have a specific reason not to; this is good not only to help avoid these kinds of deadlock problems, but also for performance, as it avoids unnecessary marshaling costs.

It does says that the goal is to avoid unnecessary marshaling costs, for performance. A thread switch (which flows the ExecutionContext, among other things) is a big marshaling cost.

Now, it doesn't say anywhere that the goal is to reduce the amount of work which is done on "special" threads or contexts.

While this may make certain sense for UI threads, I still don't think it is the major goal behind ConfigureAwait. There are other - more structured - ways to minimize work on UI threads, like using chunks of await Task.Run(work).

Moreover, it doesn't make sense at all to minimize work on AspNetSynchronizationContext - which itself flows from thread to thread, unlike with a UI thread. Quite opposite, once you're on AspNetSynchronizationContext, you want to make as much work as possible, to avoid unnecessary switching in the middle of handling the HTTP request. Nevertheless, it still makes perfect sense to use ConfigureAwait(false) in ASP.NET: if used correctly, it again reduces the server-side thread switching.

3
The TPL team must have had a very hard decision to make when they needed to define the default behavior for await. Only bad choices. Either await fails by default in GUI apps, or all libraries do the wrong thing by default. This is probably the nastiest aspect of await.usr
@Pingpong, I did my best on the summary while answering your question. TLTR, my take on this: don't use ConfigureAwait(false) and - if absolutely necessary - use TaskRun(() => SomethingAsync()) to hope off the synchronization context.noseratio
Very interesting! I am trying now to prove the opposite case, by creating a scenario where omitting the ConfigureAwait(false) could cause more thread switching. :-)Theodor Zoulias
Yeap, I like the idea of the TaskScheduler.SwitchTo() concept too. Btw I gave up at trying to create the counterexample. It's not easy, and may not be even possible. :-)Theodor Zoulias

3 Answers

21
votes

When you're dealing with asynchronous operations, the overhead of a thread switch is way too small to care about (generally speaking). The purpose of ConfigureAwait(false) is not to induce a thread switch (if necessary), but rather to prevent too much code running on a particular special context.

The reasoning behind this pattern was that "it helps to avoid deadlocks".

And stack dives.

But I do think this is a non-problem in the general case. When I encounter code that doesn't properly use ConfigureAwait, I just wrap it in a Task.Run and move on. The overhead of thread switches isn't worth worrying about.

7
votes

The major design goal behind ConfigureAwait(false) is to reduce redundant SynchronizationContext.Post continuation callbacks for await, where possible. This usually means less thread switching and less work on the UI threads.

I disagree with your premise. ConfigureAwait(false) goal is to reduce, as much as possible, the work that needs to be marshalled back to "special" (e.g. UI) contexts in spite of the thread switches it may require off of that context.

If the goal was to reduce thread switches you could just remain in the same special context throughout all the work, and then no other threads are required.

To achieve that you should be using ConfigureAwait everywhere you don't care about the thread executing the continuation. If you take your example and use ConfigureAwait appropriately you would only get a single switch (instead of 2 without it):

private async void Button_Click(object sender, RoutedEventArgs e)
{
    TaskExt.Log("A1");
    await AnotherClass.MethodAsync().ConfigureAwait(false);
    TaskExt.Log("A2");
}

public class AnotherClass
{
    public static async Task MethodAsync()
    {
        TaskExt.Log("B1");
        await SomeClass.SomeAsyncApi().ConfigureAwait(false);
        TaskExt.Log("B2");
    }
}

public class SomeClass
{
    public static async Task<int> SomeAsyncApi()
    {
        TaskExt.Log("X1");
        await Task.Delay(1000).WithCompletionLog(step: "X1.5").ConfigureAwait(false);
        TaskExt.Log("X2");
        return 42;
    }
}

Output:

{ step = A1, thread = 9 }
{ step = B1, thread = 9 }
{ step = X1, thread = 9 }
{ step = X1.5, thread = 11 }
{ step = X2, thread = 11 }
{ step = B2, thread = 11 }
{ step = A2, thread = 11 }

Now, where you do care about the continuation's thread (e.g. when you use UI controls) you "pay" by switching to that thread, by posting the relevant work to that thread. You've still gained from all the work that didn't require that thread.

If you want to take it even further and remove the synchronous work of these async methods from the UI thread you only need to use Task.Run once, and add another switch:

private async void Button_Click(object sender, RoutedEventArgs e)
{
    TaskExt.Log("A1");
    await Task.Run(() => AnotherClass.MethodAsync()).ConfigureAwait(false);
    TaskExt.Log("A2");
}

Output:

{ step = A1, thread = 9 }
{ step = B1, thread = 10 }
{ step = X1, thread = 10 }
{ step = X1.5, thread = 11 }
{ step = X2, thread = 11 }
{ step = B2, thread = 11 }
{ step = A2, thread = 11 }

This guideline to use ConfigureAwait(false) is directed at library developers because that's where it actually matters, but the point is to use it whenever you can and in that case you reduce the work on these special contexts while keeping thread switching at a minimum.


Using WithNoContext has exactly the same outcome as using ConfigureAwait(false) everywhere. The cons however is that it messes with the thread's SynchronizationContext and that you aren't aware of that inside the async method. ConfigureAwait directly affects the current await so you have the cause and effect together.

Using Task.Run too, as I've pointed out, has exactly the same outcome of using ConfigureAwait(false) everywhere with the added value of offloading the synchronous parts of the async method to the ThreadPool. If this is needed, then Task.Run is appropriate, otherwise ConfigureAwait(false) is enough.


Now, If you're dealing with a buggy library when ConfigureAwait(false) isn't used appropriately, you can hack around it by removing the SynchronizationContext but using Thread.Run is much simpler and clearer and offloading work to the ThreadPool has a very negligible overhead.

2
votes

Apparently the behavior of the built-in ConfigureAwait(false) is to invoke the continuation of the await on the ThreadPool. The reason for this, I assume, is to prevent a situation where multiple asynchronous workflows are awaiting the same incomplete task, and then their continuations are invoked on the same thread, in a serialized fashion. This scenario could potentially lead to deadlocks, in case the continuation of one workflow blocked, and waited for a signal from another workflow. The other workflow would never have a chance to send the signal, because its continuation would be sitting in the waiting queue of the same (blocked) thread.

If you don't anticipate this scenario to occur in your application (if you are sure than a task can never be awaited by two workflows), then you could try using the custom ConfigureAwait2 method below:

public static ConfiguredTaskAwaitable2 ConfigureAwait2(this Task task,
    bool continueOnCapturedContext)
    => new ConfiguredTaskAwaitable2(task, continueOnCapturedContext);

public struct ConfiguredTaskAwaitable2 : INotifyCompletion
{
    private readonly Task _task;
    private readonly bool _continueOnCapturedContext;

    public ConfiguredTaskAwaitable2(Task task, bool continueOnCapturedContext)
    {
        _task = task; _continueOnCapturedContext = continueOnCapturedContext;
    }
    public ConfiguredTaskAwaitable2 GetAwaiter() => this;
    public bool IsCompleted { get { return _task.IsCompleted; } }
    public void GetResult() { _task.GetAwaiter().GetResult(); }
    public void OnCompleted(Action continuation)
    {
        var capturedContext = _continueOnCapturedContext ?
            SynchronizationContext.Current : null;
        _ = _task.ContinueWith(_ =>
        {
            if (capturedContext != null)
                capturedContext.Post(_ => continuation(), null);
            else
                continuation();
        }, default, TaskContinuationOptions.ExecuteSynchronously,
            TaskScheduler.Default);
    }
}

I substituted the .ConfigureAwait(false) with .ConfigureAwait2(false) in your example (inside the method MethodAsync), and I got this output:

{ step = A1, thread = 1 }
{ step = B1, thread = 1 }
{ step = X1, thread = 1 }
{ step = X1.5, thread = 4 }
{ step = X2, thread = 1 }
{ step = B2, thread = 1 }
{ step = A2, thread = 1 }