1
votes

I was reading this blog :- https://developerinsider.co/why-is-one-loop-so-much-slower-than-two-loops/. And I decided to check it out using C++ and Xcode. So, I wrote a simple program given below and when I executed it, I was surprised by the result. Actually the 2nd function was slower compared to the first function contrary to what is stated in the article. Can anyone please help me figure out why this is the case?

#include <iostream>
#include <vector>
#include <chrono>
    
using namespace std::chrono;
    
void function1() {
    const int n=100000;
            
    int a1[n], b1[n], c1[n], d1[n];
            
    for(int j=0;j<n;j++){
        a1[j] = 0;
        b1[j] = 0;
        c1[j] = 0;
        d1[j] = 0;
    }
            
    auto start = high_resolution_clock::now();
        
    for(int j=0;j<n;j++){
        a1[j] += b1[j];
        c1[j] += d1[j];
    }
            
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);
        
    std::cout << duration.count() << " Microseconds." << std::endl;  
}
    
void function2() {
    const int n=100000;
            
    int a1[n], b1[n], c1[n], d1[n];
            
    for(int j=0; j<n; j++){
        a1[j] = 0;
        b1[j] = 0;
        c1[j] = 0;
        d1[j] = 0;
    }
            
    auto start = high_resolution_clock::now();
            
    for(int j=0; j<n; j++){
        a1[j] += b1[j];
    }
    
    for(int j=0;j<n;j++){
        c1[j] += d1[j];
    }
            
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);
        
    std::cout << duration.count() << " Microseconds." << std::endl;
}
        
int main(int argc, const char * argv[]) {
    function1();
    function2();
    
    return 0;
}
2
Are you using optimised code? What times are you seeing? - Alan Birtles
What was the difference in timings? Could it be random fluctuation? One way to make this easier to show is to run each loop 1000 times within the timer, to see if the first one is consistently slower than the other. - Korosia
@Korosia it was consistency around 300 microseconds for 10 iterations. - Sai Sankalp

2 Answers

1
votes

The second function iterates twice as many times as the first which means double the conditional branches (which are still quite expensive on modern CPUs) which in turn leads to it being slower. Moreover, the second function has to allocate an additional iterator variable, and it has to increment an iterator variable twice as many times.

There is also one major difference between your code and the demonstrated code in the article: your code allocates its arrays on the stack whereas the article's code allocates its arrays on the heap. This has serious performance implications for how the arrays will behave performance-wise.

The article also mentions that the behavior may not be uniform across different systems and for varying sizes of arrays. His article specifically centers around the implications of disk caching which may or may not be in effect in your code.

1
votes

The reasons why the second is faster in your case (I do not think that this works on any machine) is better cpu caching at the point at ,which you cpu has enough cache to store the arrays, the stuff your OS requires and so on, the second function will probably be much slower than the first. from a performance standpoint. I doubt that the two loop code will give better performance if there are enough other programs running as well, because the second function has obviously worse efficiency then the first and if there is enough other stuff cached the performance lead throw caching will be eliminated.