1
votes

I got curious with the difference between a for(;;) and for(:), particularly with the speed between the two. So I ran a little test by having a vector of 10 million integers and adding them all together in a for. I found that the for(:) was 1.3 slower.

What would cause the for(:) to be that much slower!?

EDIT: It seems like the for(:) uses the iterator of the vector unlike the for(;;) making it longer.

/Yu"stdafx.h" /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm /Od /sdl /Fd"Debug\vc120.pdb" /fp:precise /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_LIB" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /RTC1 /Gd /Oy- /MDd /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\forvsForLoop.pch"

#include "stdafx.h"
#include <vector>
#include <iostream>
#include <chrono>

void init(std::vector<int> &array){
    srand(20);
    for (int x = 0; x < 10000000; x++)
        array.push_back(rand());
    return;
}

unsigned long testForLoop(std::vector<int> &array){
    unsigned long result = 0;
    for (int x = 0; x < array.size(); x++)
        result += array[x];
    return result;
}
unsigned long testFor(std::vector<int> &array){
    unsigned long result = 0;
    for (const int &element : array)
        result += element;
    return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
    std::vector<int> testingArray;

    init(testingArray);

    //Warm up
    std::cout << "warming up \n";
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    std::cout << "starting \n";

    auto start = std::chrono::high_resolution_clock::now();
    testForLoop(testingArray);
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "ForLoop took: " <<  std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;


    start = std::chrono::high_resolution_clock::now();
    testFor(testingArray);
    end = std::chrono::high_resolution_clock::now();
    std::cout << "For---- took: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;

    system("pause");
    return 0;

}
4
In addition to posing code, make sure your benchmarks are run with optimizations on.Michael Burr
Yes, the range-based for loop is using iterators, just like you can do with a non-range-base for loop. See e.g. this reference for a typical implementation. If you don't use iterators in your other for loop, then the tests are not equal.Some programmer dude
You're benchmarking non-optimised code (/Od, /D_DEBUG etc.). That's like determining the fastest runner by measuring who can read maps best. Turn optimisations on and try again.Angew is no longer proud of SO
That optimizes the loops out and both take 0 nanoseconds to complete.Joshua Waring
I used the returns of the functions to give them a purpose so they aren't removed and the for(;;) took: 6006000ns and 8001200ns for the for(:)Joshua Waring

4 Answers

2
votes

To make sure that the test are not optimized out I printed out the result:

 auto x = testForLoop(......

 // ^^^
 ......nd - start).count() << "  R: " << x << std::endl;

                          //  ^^^^^^^^^^^^^^^^

Normal mode: (approx half speed)

> g++ -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 33262788  R: 10739647121123056
For---- took: 51263111   R: 10739647121123056

Optimized: (practically identical)

> g++ -O3 -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 4861314  R: 10739647121123056
For---- took: 4997957   R: 10739647121123056
3
votes

If you are using:

for ( auto x : ... )

Then each x is a copy. Less overhead might be:

for ( const auto & x : ... )
3
votes

The standard doesn't say anything about performance or implementation. Both loops should work properly and also the performance should be equal in normal situations. No one can say why it's too slow in MSVC++ unless he claim it's a bug or bad implementation. Maybe you should change the optimization settings correctly.

I've tested your code in MSVC++, GCC and Clang.

GCC output

ForLoop took: 7879773
For---- took: 5786831

Clang output

ForLoop took: 6537441
For---- took: 6743614

and MSVC++ output

ForLoop took: 77786200
For---- took: 249612200

Both GCC and Clang have reasonable results and two loops are close to each other as expected. But the MSVC++'s result is vague and unrealistic. I call it a bug or regression. Or, your bad configuration to compile, try another optimization settings.

1
votes

The answer is a guess, being subjective to the exact code and optimization used. Also the underlying platform can change how code behavior can work.

There are essentialkly two "low level" ways to manage an iteration: one is based on a "reassignable pointer", the other is based on a "constant pointer and an offset".

In pseudocode

loop { *a = *b; ++a; ++b; }

vs

loop { a[i] = b[i]; ++i; }

Depending on the processor architecture, the two are not the same having a different behavior respect to the use of registers, address locality and caches: the first has two sums with a memory-held constant, the second has two sums with a register, and a register increment. (and both have a memory copy)

On x86 platforms, the second is preferable, since does less memory access, and uses instructions requiring less memory fetches.

Now, an iterator based loop applied to a vector (whose iterators wrap pointers) lead to the first form, while a traditional index based loop leads to the second form.

Now for(a: v) { .... } is the same as for(auto i=v.begin(); i!=v.end(); ++i) { auto& a=*i; ... }

It works with whatever form of container (also not memory sequential) but cannot reduce to index based. Unless compiler optimization are so good to find out that the iterator is actually a pointer moving by constant increments.