Why emplace_back is faster than push_back?

Question

I thought that emplace_back would be the winner, when doing something like this:

v.push_back(myClass(arg1, arg2));

because emplace_back would construct the object immediately in the vector, while push_back, would first construct an anonymous object and then would copy it to the vector. For more see this question.

Google also gives this and this questions.

I decided to compare them for a vector that would be filled by integers.

Here is the experiment code:

#include <iostream>
#include <vector>
#include <ctime>
#include <ratio>
#include <chrono>

using namespace std;
using namespace std::chrono;

int main() {

  vector<int> v1;

  const size_t N = 100000000;

  high_resolution_clock::time_point t1 = high_resolution_clock::now();
  for(size_t i = 0; i < N; ++i)
    v1.push_back(i);
  high_resolution_clock::time_point t2 = high_resolution_clock::now();

  duration<double> time_span = duration_cast<duration<double>>(t2 - t1);

  std::cout << "push_back took me " << time_span.count() << " seconds.";
  std::cout << std::endl;

  vector<int> v2;

  t1 = high_resolution_clock::now();
  for(size_t i = 0; i < N; ++i)
    v2.emplace_back(i);
  t2 = high_resolution_clock::now();
  time_span = duration_cast<duration<double>>(t2 - t1);
  std::cout << "emplace_back took me " << time_span.count() << " seconds.";
  std::cout << std::endl;

  return 0;
}

The result is that emplace_back is faster.

push_back took me 2.76127 seconds.
emplace_back took me 1.99151 seconds.

Why? The answer of the 1st linked question clearly says that there will be no performance difference.

Also tried with other time methods, but got identical results.

[EDIT] Comments say that testing with ints doesn't say anything and that push_back takes a ref.

I did the same test in the code above, but instead of int I had a class A:

class A {
 public:
  A(int a) : a(a) {}
 private:
  int a;
};

Result:

push_back took me 6.92313 seconds.
emplace_back took me 6.1815 seconds.

[EDIT.2]

As denlan said, I should also change the position of the operations, so I swapped them and in both situation (int and class A), emplace_back was again the winner.

[SOLUTION]

I was running the code in debug mode, which makes the measurements invalid. For benchmarking, always run the code in release mode.

So, does it mean that you do not see these performance differences when you use in release mode? Do you have numbers for the same? — talekeDskobeDa
Hey @talekeDskobeDa, I was executing this code in my old dead laptop, so no, sorry. — gsamaras
I understand that in principal emplace_back would be faster – certainly not slower. What I don't understand (and can't find anywhere) is why even in this simplest case you have, why the optimizer doesn't produce the same code. Any ideas? — Ben
Not really @Ben, but if you'd like, you could post a new question, linking to my question, asking exactly that (if you do share it with me please). :) — gsamaras

Kerrek SB Kerrek SB · Accepted Answer · 2014-05-17T23:47:00

Your test case isn't very helpful. push_back takes a container element and copies/moves it into the container. emplace_back takes arbitrary arguments and constructs from those a new container element. But if you pass a single argument that's already of element type to emplace_back, you'll just use the copy/move constructor anyway.

Here's a better comparison:

Foo x; Bar y; Zip z;

v.push_back(T(x, y, z));  // make temporary, push it back
v.emplace_back(x, y, z);  // no temporary, directly construct T(x, y, z) in place

The key difference, however, is that emplace_back performs explicit conversions:

std::vector<std::unique_ptr<Foo>> v;
v.emplace_back(new Foo(1, 'x', true));  // constructor is explicit!

This example will be mildly contrived in the future, when you should say v.push_back(std::make_unique<Foo>(1, 'x', true)). However, other constructions are very nice with emplace, too:

std::vector<std::thread> threads;
threads.emplace_back(do_work, 10, "foo");    // call do_work(10, "foo")
threads.emplace_back(&Foo::g, x, 20, false);  // call x.g(20, false)

Why emplace_back is faster than push_back?

1 Answers