Using one loop vs two loops

Question

I was reading this blog :- https://developerinsider.co/why-is-one-loop-so-much-slower-than-two-loops/. And I decided to check it out using C++ and Xcode. So, I wrote a simple program given below and when I executed it, I was surprised by the result. Actually the 2nd function was slower compared to the first function contrary to what is stated in the article. Can anyone please help me figure out why this is the case?

#include <iostream>
#include <vector>
#include <chrono>
    
using namespace std::chrono;
    
void function1() {
    const int n=100000;
            
    int a1[n], b1[n], c1[n], d1[n];
            
    for(int j=0;j<n;j++){
        a1[j] = 0;
        b1[j] = 0;
        c1[j] = 0;
        d1[j] = 0;
    }
            
    auto start = high_resolution_clock::now();
        
    for(int j=0;j<n;j++){
        a1[j] += b1[j];
        c1[j] += d1[j];
    }
            
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);
        
    std::cout << duration.count() << " Microseconds." << std::endl;  
}
    
void function2() {
    const int n=100000;
            
    int a1[n], b1[n], c1[n], d1[n];
            
    for(int j=0; j<n; j++){
        a1[j] = 0;
        b1[j] = 0;
        c1[j] = 0;
        d1[j] = 0;
    }
            
    auto start = high_resolution_clock::now();
            
    for(int j=0; j<n; j++){
        a1[j] += b1[j];
    }
    
    for(int j=0;j<n;j++){
        c1[j] += d1[j];
    }
            
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);
        
    std::cout << duration.count() << " Microseconds." << std::endl;
}
        
int main(int argc, const char * argv[]) {
    function1();
    function2();
    
    return 0;
}

What was the difference in timings? Could it be random fluctuation? One way to make this easier to show is to run each loop 1000 times within the timer, to see if the first one is consistently slower than the other. — Korosia
@Korosia it was consistency around 300 microseconds for 10 iterations. — Sai Sankalp

ComedicChimera ComedicChimera · Accepted Answer · 2020-06-27T07:22:30

The second function iterates twice as many times as the first which means double the conditional branches (which are still quite expensive on modern CPUs) which in turn leads to it being slower. Moreover, the second function has to allocate an additional iterator variable, and it has to increment an iterator variable twice as many times.

There is also one major difference between your code and the demonstrated code in the article: your code allocates its arrays on the stack whereas the article's code allocates its arrays on the heap. This has serious performance implications for how the arrays will behave performance-wise.

The article also mentions that the behavior may not be uniform across different systems and for varying sizes of arrays. His article specifically centers around the implications of disk caching which may or may not be in effect in your code.

Using one loop vs two loops

2 Answers