131
votes

I recently created a simple application for testing the HTTP call throughput that can be generated in an asynchronous manner vs a classical multithreaded approach.

The application is a able to perform a predefined number of HTTP calls and at the end it displays the total time needed to perform them. During my tests, all HTTP calls were made to my local IIS sever and they retrieved a small text file (12 bytes in size).

The most important part of the code for the asynchronous implementation is listed below:

public async void TestAsync()
{
    this.TestInit();
    HttpClient httpClient = new HttpClient();

    for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
    {
        ProcessUrlAsync(httpClient);
    }
}

private async void ProcessUrlAsync(HttpClient httpClient)
{
    HttpResponseMessage httpResponse = null;

    try
    {
        Task<HttpResponseMessage> getTask = httpClient.GetAsync(URL);
        httpResponse = await getTask;

        Interlocked.Increment(ref _successfulCalls);
    }
    catch (Exception ex)
    {
        Interlocked.Increment(ref _failedCalls);
    }
    finally
    { 
        if(httpResponse != null) httpResponse.Dispose();
    }

    lock (_syncLock)
    {
        _itemsLeft--;
        if (_itemsLeft == 0)
        {
            _utcEndTime = DateTime.UtcNow;
            this.DisplayTestResults();
        }
    }
}

The most important part of the multithreading implementation is listed below:

public void TestParallel2()
{
    this.TestInit();
    ServicePointManager.DefaultConnectionLimit = 100;

    for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
    {
        Task.Run(() =>
        {
            try
            {
                this.PerformWebRequestGet();
                Interlocked.Increment(ref _successfulCalls);
            }
            catch (Exception ex)
            {
                Interlocked.Increment(ref _failedCalls);
            }

            lock (_syncLock)
            {
                _itemsLeft--;
                if (_itemsLeft == 0)
                {
                    _utcEndTime = DateTime.UtcNow;
                    this.DisplayTestResults();
                }
            }
        });
    }
}

private void PerformWebRequestGet()
{ 
    HttpWebRequest request = null;
    HttpWebResponse response = null;

    try
    {
        request = (HttpWebRequest)WebRequest.Create(URL);
        request.Method = "GET";
        request.KeepAlive = true;
        response = (HttpWebResponse)request.GetResponse();
    }
    finally
    {
        if (response != null) response.Close();
    }
}

Running the tests revealed that the multithreaded version was faster. It took it around 0.6 seconds to complete for 10k requests, while the async one took around 2 seconds to complete for the same amount of load. This was a bit of a surprise, because I was expecting the async one to be faster. Maybe it was because of the fact that my HTTP calls were very fast. In a real world scenario, where the server should perform a more meaningful operation and where there should also be some network latency, the results might be reversed.

However, what really concerns me is the way HttpClient behaves when the load is increased. Since it takes it around 2 seconds to deliver 10k messages, I thought it would take it around 20 seconds to deliver 10 times the number of messages, but running the test showed that it needs around 50 seconds to deliver the 100k messages. Furthermore, it usually takes it more than 2 minutes to deliver 200k messages and often, a few thousands of them (3-4k) fail with the following exception:

An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

I checked the IIS logs and operations that failed never got to the server. They failed within the client. I ran the tests on a Windows 7 machine with the default range of ephemeral ports of 49152 to 65535. Running netstat showed that around 5-6k ports were being used during tests, so in theory there should have been many more available. If the lack of ports was indeed the cause of the exceptions it means that either netstat didn't properly report the situation or HttClient only uses a maximum number of ports after which it starts throwing exceptions.

By contrast, the multithread approach of generating HTTP calls behaved very predictable. I took it around 0.6 seconds for 10k messages, around 5.5 seconds for 100k messages and as expected around 55 seconds for 1 million messages. None of the messages failed. Further more, while it ran, it never used more than 55 MB of RAM (according to Windows Task Manager). The memory used when sending messages asynchronously grew proportionally with the load. It used around 500 MB of RAM during the 200k messages tests.

I think there are two main reasons for the above results. The first one is that HttpClient seems to be very greedy in creating new connections with the server. The high number of used ports reported by netstat means that it probably doesn't benefit much from HTTP keep-alive.

The second is that HttpClient doesn't seem to have a throttling mechanism. In fact this seems to be a general problem related to async operations. If you need to perform a very large number of operations they will all be started at once and then their continuations will be executed as they are available. In theory this should be ok, because in async operations the load is on external systems but as proved above this is not entirely the case. Having a big number of requests started at once will increase the memory usage and slow down the entire execution.

I managed to obtain better results, memory and execution time wise, by limiting the maximum number of asynchronous requests with a simple but primitive delay mechanism:

public async void TestAsyncWithDelay()
{
    this.TestInit();
    HttpClient httpClient = new HttpClient();

    for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
    {
        if (_activeRequestsCount >= MAX_CONCURENT_REQUESTS)
            await Task.Delay(DELAY_TIME);

        ProcessUrlAsyncWithReqCount(httpClient);
    }
}

It would be really useful if HttpClient included a mechanism for limiting the number of concurrent requests. When using the Task class (which is based on the .Net thread pool) throttling is automatically achieved by limiting the number of concurrent threads.

For a complete overview, I have also created a version of the async test based on HttpWebRequest rather than HttpClient and managed to obtain much better results. For a start, it allows setting a limit on the number of concurrent connections (with ServicePointManager.DefaultConnectionLimit or via config), which means that it never ran out of ports and never failed on any request (HttpClient, by default, is based on HttpWebRequest, but it seems to ignore the connection limit setting).

The async HttpWebRequest approach was still about 50 - 60% slower than the multithreading one, but it was predictable and reliable. The only downside to it was that it used a huge amount of memory under big load. For example it needed around 1.6 GB for sending 1 million requests. By limiting the number of concurrent requests (like I did above for HttpClient) I managed to reduce the used memory to just 20 MB and obtain an execution time just 10% slower than the multithreading approach.

After this lengthy presentation, my questions are: Is the HttpClient class from .Net 4.5 a bad choice for intensive load applications? Is there any way to throttle it, which should fix the problems I mention about? How about the async flavor of HttpWebRequest?

Update (thanks @Stephen Cleary)

As it turns out, HttpClient, just like HttpWebRequest (on which it is based by default), can have its number of concurrent connections on the same host limited with ServicePointManager.DefaultConnectionLimit. The strange thing is that according to MSDN, the default value for the connection limit is 2. I also checked that on my side using the debugger which pointed that indeed 2 is the default value. However, it seems that unless explicitly setting a value to ServicePointManager.DefaultConnectionLimit, the default value will be ignored. Since I didn't explicitly set a value for it during my HttpClient tests I thought it was ignored.

After setting ServicePointManager.DefaultConnectionLimit to 100 HttpClient became reliable and predictable (netstat confirms that only 100 ports are used). It is still slower than async HttpWebRequest (by about 40%), but strangely, it uses less memory. For the test which involves 1 million requests, it used a maximum of 550 MB, compared to 1.6 GB in the async HttpWebRequest.

So, while HttpClient in combination ServicePointManager.DefaultConnectionLimit seem to ensure reliability (at least for the scenario where all the calls are being made towards the same host), it still looks like its performance is negatively impacted by the lack of a proper throttling mechanism. Something that would limit the concurrent number of requests to a configurable value and put the rest in a queue would make it much more suitable for high scalability scenarios.

3
HttpClient should respect ServicePointManager.DefaultConnectionLimit.Stephen Cleary
Your observations seem worth to be investigated. One thing is bothering me though: I think it is highly contrived to issue thousands of async IOs at once. I would never do this in production. The fact that you're async does not mean that you can go nuts consuming various resources. (Microsofts official samples are a little misleading in that regard as well.)usr
Don't throttle with time delays, though. Throttle on a fixed concurrency level that you empirically determine. A simple solution would be SemaphoreSlim.WaitAsync although that would also not be suitable for arbitrarily large amounts of tasks.usr
@FlorinDumitrescu For throttling, you can use SemaphoreSlim, as already mentioned, or ActionBlock<T> from TPL Dataflow.svick
@svick, thanks for your suggestions. I am not interested in manually implementing a mechanism for throttling / concurrency limitation. As mentioned, the implementation included in my question was only for testing and for validating a theory. I am not trying to improve it, since it will not get to production. What I am interested in is if the .Net framework offers a built in mechanism for limiting the concurrency of async IO operations (HttpClient included).Florin Dumitrescu

3 Answers

64
votes

Besides the tests mentioned in the question, I recently created some new ones involving much fewer HTTP calls (5000 compared to 1 million previously) but on requests that took much longer to execute (500 milliseconds compared to around 1 millisecond previously). Both tester applications, the synchronously multithreaded one (based on HttpWebRequest) and asynchronous I/O one (based on HTTP client) produced similar results: about 10 seconds to execute using around 3% of the CPU and 30 MB of memory. The only difference between the two testers was that the multithreaded one used 310 threads to execute, while the asynchronous one just 22. So in an application that would have combined both I/O bound and CPU bound operations the asynchronous version would have produced better results because there would have been more CPU time available for the threads performing CPU operations, which are the ones that actually need it (threads waiting for I/O operations to complete are just wasting).

As a conclusion to my tests, asynchronous HTTP calls are not the best option when dealing with very fast requests. The reason behind that is that when running a task that contains an asynchronous I/O call, the thread on which the task is started is quit as soon the as the asynchronous call is made and the rest of the task is registered as a callback. Then, when the I/O operation completes, the callback is queued for execution on the first available thread. All this creates an overhead, which makes fast I/O operations to be more efficient when executed on the thread that started them.

Asynchronous HTTP calls are a good option when dealing with long or potentially long I/O operations because it doesn't keep any threads busy on waiting for the I/O operations to complete. This decreases the overall number of threads used by an application allowing more CPU time to be spent by CPU bound operations. Furthermore, on applications that only allocate a limited number of threads (like it is the case with web applications), asynchronous I/O prevents thread pool thread depletion, which can happen if performing I/O calls synchronously.

So, async HttpClient is not a bottleneck for intensive load applications. It is just that by its nature it is not very well suited for very fast HTTP requests, instead it is ideal for long or potentially long ones, especially inside applications that only have a limited number of threads available. Also, it is a good practice to limit concurrency via ServicePointManager.DefaultConnectionLimit with a value that high enough to ensure a good level of parallelism, but low enough to prevent ephemeral port depletion. You can find more details on the tests and conclusions presented for this question here.

27
votes

One thing to consider that might be affecting your results is that with the HttpWebRequest you are not getting the ResponseStream and consuming that stream. With HttpClient, by default it will copy the network stream into a memory stream. In order to use HttpClient in the same way that you are currently using HttpWebRquest you would need to do

var requestMessage = new HttpRequestMessage() {RequestUri = URL};
Task<HttpResponseMessage> getTask = httpClient.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);

The other thing is that I'm not really sure what the real difference, from a threading perspective, you are actually testing. If you dig into the depths of HttpClientHandler it simply does Task.Factory.StartNew in order to perform an async request. The threading behaviour is delegated to the synchronization context in exactly the same way as your example with HttpWebRequest example is done.

Undoubtedly, HttpClient add some overhead as by default it uses HttpWebRequest as its transport library. So you will always be able to get better perf with a HttpWebRequest directly whilst using HttpClientHandler. The benefits that HttpClient brings is with the standard classes like HttpResponseMessage, HttpRequestMessage, HttpContent and all the strongly typed headers. In itself it is not a perf optimization.

17
votes

While this does not directly answer the 'async' part of the OP's question, this addresses an error in the implementation he is using.

If you want your application to scale, avoid using instance-based HttpClients. The difference is HUGE! Depending on the load, you will see very different performance numbers. The HttpClient was designed to be re-used across requests. This was confirmed by guys on the BCL team who wrote it.

A recent project I had was to help a very large and well-known online computer retailer scale out for Black Friday/holiday traffic for some new systems. We ran into some performance issues around the usage of HttpClient. Since it implements IDisposable, the devs did what you would normally do by creating an instance and placing it inside of a using() statement. Once we started load testing the app brought the server to its knees - yes, the server not just the app. The reason is that every instance of HttpClient opens an I/O Completion Port on the server. Because of non-deterministic finalization of GC and the fact that you are working with computer resources that span across multiple OSI layers, closing network ports can take a while. In fact Windows OS itself can take up to 20 secs to close a port (per Microsoft). We were opening ports faster than they could be closed - server port exhaustion which hammered the CPU to 100%. My fix was to change the HttpClient to a static instance which solved the problem. Yes, it is a disposable resource, but any overhead is vastly outweighed by the difference in performance. I encourage you to do some load testing to see how your app behaves.

Also answered at link below:

What is the overhead of creating a new HttpClient per call in a WebAPI client?

https://www.asp.net/web-api/overview/advanced/calling-a-web-api-from-a-net-client