This is a question that came to mind while reading the brilliant answer by Mysticial to the question: why is it faster to process a sorted array than an unsorted array?
Context for the types involved:
const unsigned arraySize = 32768;
int data[arraySize];
long long sum = 0;
In his answer he explains that the Intel Compiler (ICC) optimizes this:
for (int i = 0; i < 100000; ++i)
for (int c = 0; c < arraySize; ++c)
if (data[c] >= 128)
sum += data[c];
...into something equivalent to this:
for (int c = 0; c < arraySize; ++c)
if (data[c] >= 128)
for (int i = 0; i < 100000; ++i)
sum += data[c];
The optimizer is recognizing that these are equivalent and is therefore exchanging the loops, moving the branch outside the inner loop. Very clever!
But why doesn't it do this?
for (int c = 0; c < arraySize; ++c)
if (data[c] >= 128)
sum += 100000 * data[c];
Hopefully Mysticial (or anyone else) can give an equally brilliant answer. I've never learned about the optimizations discussed in that other question before, so I'm really grateful for this.
volatile
, then the loop interchange would be an invalid optimization as well. – Ben Voigt