While optimizing performance critical code, I noticed iterating over a std::set was a bit slow.
I then wrote a benchmarker and tested the speeds of iteration over a vector by iterator (auto it : vector
), iterating over a set by iterator, and iterating over a vector by index (int i = 0; i < vector.size(); ++i
).
The containers are constructed identically, with 1024 random ints. (Of course, each int is unique since we're working with sets). Then, for each run, we loop through the container and sum their ints into a long int. Each run has 1000 iterations doing the sum, and the test was averaged over 1000 runs.
Here are my results:
Testing vector by iterator
✓
Maximum duration: 0.012418
Minimum duration: 0.007971
Average duration: 0.008354
Testing vector by index
✓
Maximum duration: 0.002881
Minimum duration: 0.002094
Average duration: 0.002179
Testing set by iterator
✓
Maximum duration: 0.021862
Minimum duration: 0.014278
Average duration: 0.014971
As we can see, iterating over a set by iterator is 1.79x slower than by vector, and a whopping 6.87x slower than vector by index.
What is going on here? Isn't a set just a structured vector that checks whether each item is unique upon insertion? Why should it be so much slower?
Edit: Thank you for your replies! Good explanations. By request, here is the code of the benchmark.
#include <chrono>
#include <random>
#include <string>
#include <functional>
#include <set>
#include <vector>
void benchmark(const char* name, int runs, int iterations, std::function<void(int)> func) {
printf("Testing %s\n", name);
std::chrono::duration<double> min = std::chrono::duration<double>::max();
std::chrono::duration<double> max = std::chrono::duration<double>::min();
std::chrono::duration<double> run = std::chrono::duration<double>::zero();
std::chrono::duration<double> avg = std::chrono::duration<double>::zero();
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
// [removed] progress bar code
for (int i = 0; i < runs; ++i) {
t1 = std::chrono::high_resolution_clock::now();
func(iterations);
t2 = std::chrono::high_resolution_clock::now();
run = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
// [removed] progress bar code
if (run < min) min = run;
if (run > max) max = run;
avg += run / 1000.0;
}
// [removed] progress bar code
printf("Maximum duration: %f\n", max.count());
printf("Minimum duration: %f\n", min.count());
printf("Average duration: %f\n", avg.count());
printf("\n");
}
int main(int argc, char const *argv[]) {
const unsigned int arrSize = 1024;
std::vector<int> vector; vector.reserve(arrSize);
std::set<int> set;
for (int i = 0; i < arrSize; ++i) {
while (1) {
int entry = rand() - (RAND_MAX / 2);
auto ret = set.insert(entry);
if (ret.second) {
vector.push_back(entry);
break;
}
}
}
printf("Created vector of size %lu, set of size %lu\n", vector.size(), set.size());
benchmark("vector by iterator", 1000, 1000, [vector](int runs) -> void {
for (int i = 0; i < runs; ++i) {
long int sum = 0;
for (auto it : vector) {
sum += it;
}
}
});
benchmark("vector by index", 1000, 1000, [vector, arrSize](int runs) -> void {
for (int i = 0; i < runs; ++i) {
long int sum = 0;
for (int j = 0; j < arrSize; ++j) {
sum += vector[j];
}
}
});
benchmark("set by iterator", 1000, 1000, [set](int runs) -> void {
for (int i = 0; i < runs; ++i) {
long int sum = 0;
for (auto it : set) {
sum += it;
}
}
});
return 0;
}
I'm working on posting the results using O2, but I'm trying to get the compiler to avoid optimizing away the sum.
boost::flat_set
boost.org/doc/libs/1_70_0/doc/html/boost/container/… – alfCvector
by index or by iterator should be equally performant, but without optimizations turned on the iterator approach may not inline a bunch of stuff that definitely should be inlined.set
s are almost certainly slower, but you're going to have a hard time saying how much slower they are in practice when compiling without optimization. – ShadowRanger