617
votes

I'd like to comprehensively understand the run-time performance cost of a Docker container. I've found references to networking anecdotally being ~100µs slower.

I've also found references to the run-time cost being "negligible" and "close to zero" but I'd like to know more precisely what those costs are. Ideally I'd like to know what Docker is abstracting with a performance cost and things that are abstracted without a performance cost. Networking, CPU, memory, etc.

Furthermore, if there are abstraction costs, are there ways to get around the abstraction cost. For example, perhaps I can mount a disk directly vs. virtually in Docker.

3
@GoloRoden that question is similar but not exactly the same. I'm looking for latency costs with reasons like "networking is being passed through an extra layer" whereas that question's accepted answer is more about measuring the costs of the container + app.Luke Hoersten
Okay, that's right. I retracted my close vote.Golo Roden
I'm glad you posted it though. That question didn't come up in my search. The measurement/metrics article is super useful: blog.docker.io/2013/10/gathering-lxc-docker-containers-metricsLuke Hoersten
This is a good session titled "Linux Containers - NextGen Virtualization for Cloud" telling performance metrics by comparing docker, KVM VM and bare metal: youtube.com/watch?v=a4oOAVhNLjUshawnzhu

3 Answers

548
votes

An excellent 2014 IBM research paper “An Updated Performance Comparison of Virtual Machines and Linux Containers” by Felter et al. provides a comparison between bare metal, KVM, and Docker containers. The general result is: Docker is nearly identical to native performance and faster than KVM in every category.

The exception to this is Docker’s NAT — if you use port mapping (e.g., docker run -p 8080:8080), then you can expect a minor hit in latency, as shown below. However, you can now use the host network stack (e.g., docker run --net=host) when launching a Docker container, which will perform identically to the Native column (as shown in the Redis latency results lower down).

Docker NAT overhead

They also ran latency tests on a few specific services, such as Redis. You can see that above 20 client threads, highest latency overhead goes Docker NAT, then KVM, then a rough tie between Docker host/native.

Docker Redis Latency Overhead

Just because it’s a really useful paper, here are some other figures. Please download it for full access.

Taking a look at Disk I/O:

Docker vs. KVM vs. Native I/O Performance

Now looking at CPU overhead:

Docker CPU Overhead

Now some examples of memory (read the paper for details, memory can be extra tricky):

Docker Memory Comparison

134
votes

Docker isn't virtualization, as such -- instead, it's an abstraction on top of the kernel's support for different process namespaces, device namespaces, etc.; one namespace isn't inherently more expensive or inefficient than another, so what actually makes Docker have a performance impact is a matter of what's actually in those namespaces.


Docker's choices in terms of how it configures namespaces for its containers have costs, but those costs are all directly associated with benefits -- you can give them up, but in doing so you also give up the associated benefit:

  • Layered filesystems are expensive -- exactly what the costs are vary with each one (and Docker supports multiple backends), and with your usage patterns (merging multiple large directories, or merging a very deep set of filesystems will be particularly expensive), but they're not free. On the other hand, a great deal of Docker's functionality -- being able to build guests off other guests in a copy-on-write manner, and getting the storage advantages implicit in same -- ride on paying this cost.
  • DNAT gets expensive at scale -- but gives you the benefit of being able to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them. You can replace this with a bridge to a physical interface, but again, lose the benefit.
  • Being able to run each software stack with its dependencies installed in the most convenient manner -- independent of the host's distro, libc, and other library versions -- is a great benefit, but needing to load shared libraries more than once (when their versions differ) has the cost you'd expect.

And so forth. How much these costs actually impact you in your environment -- with your network access patterns, your memory constraints, etc -- is an item for which it's difficult to provide a generic answer.

26
votes

Here's some more benchmarks for Docker based memcached server versus host native memcached server using Twemperf benchmark tool https://github.com/twitter/twemperf with 5000 connections and 20k connection rate

Connect time overhead for docker based memcached seems to agree with above whitepaper at roughly twice native speed.

Twemperf Docker Memcached

Connection rate: 9817.9 conn/s
Connection time [ms]: avg 341.1 min 73.7 max 396.2 stddev 52.11
Connect time [ms]: avg 55.0 min 1.1 max 103.1 stddev 28.14
Request rate: 83942.7 req/s (0.0 ms/req)
Request size [B]: avg 129.0 min 129.0 max 129.0 stddev 0.00
Response rate: 83942.7 rsp/s (0.0 ms/rsp)
Response size [B]: avg 8.0 min 8.0 max 8.0 stddev 0.00
Response time [ms]: avg 28.6 min 1.2 max 65.0 stddev 0.01
Response time [ms]: p25 24.0 p50 27.0 p75 29.0
Response time [ms]: p95 58.0 p99 62.0 p999 65.0

Twemperf Centmin Mod Memcached

Connection rate: 11419.3 conn/s
Connection time [ms]: avg 200.5 min 0.6 max 263.2 stddev 73.85
Connect time [ms]: avg 26.2 min 0.0 max 53.5 stddev 14.59
Request rate: 114192.6 req/s (0.0 ms/req)
Request size [B]: avg 129.0 min 129.0 max 129.0 stddev 0.00
Response rate: 114192.6 rsp/s (0.0 ms/rsp)
Response size [B]: avg 8.0 min 8.0 max 8.0 stddev 0.00
Response time [ms]: avg 17.4 min 0.0 max 28.8 stddev 0.01
Response time [ms]: p25 12.0 p50 20.0 p75 23.0
Response time [ms]: p95 28.0 p99 28.0 p999 29.0

Here's bencmarks using memtier benchmark tool

memtier_benchmark docker Memcached

4         Threads
50        Connections per thread
10000     Requests per thread
Type        Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
------------------------------------------------------------------------
Sets       16821.99          ---          ---      1.12600      2271.79
Gets      168035.07    159636.00      8399.07      1.12000     23884.00
Totals    184857.06    159636.00      8399.07      1.12100     26155.79

memtier_benchmark Centmin Mod Memcached

4         Threads
50        Connections per thread
10000     Requests per thread
Type        Ops/sec     Hits/sec   Misses/sec      Latency       KB/sec
------------------------------------------------------------------------
Sets       28468.13          ---          ---      0.62300      3844.59
Gets      284368.51    266547.14     17821.36      0.62200     39964.31
Totals    312836.64    266547.14     17821.36      0.62200     43808.90