13
votes

I was reading the optimization options for GCC when I found the option -funroll-all-loops.

Its description reads:

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. '-funroll-all-loops' implies the same options as '-funroll-loops'

How can the compiler unroll a loop if its number of iterations is unknown at compile time? Doesn't the compiler need this information to unroll it? What corresponding C code does it generate, and in what contexts could this be useful if it usually makes the programs run more slowly?

4
@Olaf how is this not a specific question? OP's clearly asking how gcc (a specific C compiler) is able to know how to unroll a loop if its bounds are unknown. - Ricky Mutschlechner
@Olaf How is it too broad if I want to know what a specific compiler option does? - PC Luddite
@paulsm4 I don't think so. If the compiler knows the bounds already, then the decision comes down to how much to actually unroll it. This question isn't asking that. - Ricky Mutschlechner
You may find the discussion about Duff's Device in my answer to Switch case weird scoping helpful. (Other posters are mentioning Duff's Device, too.) - Joshua Taylor

4 Answers

3
votes

in what contexts could this be useful if it usually makes the programs run more slowly?

Well they are assuming if you choose this option you know what you are doing, if you don't you should not use this option.

what is gcc going to do, well I used this sample program:

#include <stdio.h>

void f(int j )
{
  for( int k = 0; k < j; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

and tested it with godbolt and it generates a jump table based on the number of iterations left (see it live):

cmpl    $1, %ebp
movl    $1, %ebx
je  .L1
testl   %r12d, %r12d
je  .L27
cmpl    $1, %r12d
je  .L28
cmpl    $2, %r12d
je  .L29
cmpl    $3, %r12d
je  .L30
cmpl    $4, %r12d
je  .L31
cmpl    $5, %r12d
je  .L32
cmpl    $6, %r12d
je  .L33
3
votes

Here's some C code showing how to do it:

int iterations = 100;
int unrollValue = 8;

while (iterations%unrollvalue)
{
   // insert loop code here
   iterations--;
}

while (iterations)
{
   // insert unrollValue copies of loop code here
   iterations-= unrollValue;
}

The compiler will replace the first loop with a relative jump, but that's not easy to represent in C. Note that unrolling by a power of 2 allows the compiler to use a mask instead of the (expensive) divide operation.

2
votes

It can do something like:

while(n >= 8){
  foo(); foo(); foo(); foo(); foo(); foo(); foo(); foo(); 
  n -= 8;
}
while(n > 0){
  foo();
  n--;
}

Of course, Duff's Device would save having to write the second loop.

Why do it? That's up to the user. If foo() spends more than a few cycles, or if the original loop takes less that 5% of overall wall-clock time, or if n is typically small, it's probably not worth the trouble.

0
votes

You can't assume that there's such a thing as corresponding C code for a compiler's intermediate representation. But in this case, I'd expect the nearest equivalent to be something like Duff's Device, which is a sequence (usually in a loop) that can be entered at a computed location.