I have a question about C compiler optimization and when/how loops in inline functions are unrolled.
I am developing a numerical code which does something like the example below. Basically, my_for()
would compute some kind of stencil and call op()
to do something with the data in my_type *arg
for each i
. Here, my_func()
wraps my_for()
, creating the argument and sending the function pointer to my_op()
... who’s job it is to modify the i
th double for each of the (arg->n
) double arrays arg->dest[j]
.
typedef struct my_type {
int const n;
double *dest[16];
double const *src[16];
} my_type;
static inline void my_for( void (*op)(my_type *,int), my_type *arg, int N ) {
int i;
for( i=0; i<N; ++i )
op( arg, i );
}
static inline void my_op( my_type *arg, int i ) {
int j;
int const n = arg->n;
for( j=0; j<n; ++j )
arg->dest[j][i] += arg->src[j][i];
}
void my_func( double *dest0, double *dest1, double const *src0, double const *src1, int N ) {
my_type Arg = {
.n = 2,
.dest = { dest0, dest1 },
.src = { src0, src1 }
};
my_for( &my_op, &Arg, N );
}
This works fine. The functions are inlining as they should and the code is (almost) as efficient as having written everything inline in a single function and unrolled the j
loop, without any sort of my_type Arg
.
Here’s the confusion: if I set int const n = 2;
rather than int const n = arg->n;
in my_op()
, then the code becomes as fast as the unrolled single-function version. So, the question is: why? If everything is being inlined into my_func()
, why doesn’t the compiler see that I am literally defining Arg.n = 2
? Furthermore, there is no improvement when I explicitly make the bound on the j
loop arg->n
, which should look just like the speedier int const n = 2;
after inlining. I also tried using my_type const
everywhere to really signal this const-ness to the compiler, but it just doesn't want to unroll the loop.
In my numerical code, this amounts to about a 15% performance hit. If it matters, there, n=4
and these j
loops appear in a couple of conditional branches in an op()
.
I am compiling with icc (ICC) 12.1.5 20120612. I tried #pragma unroll
. Here are my compiler options (did I miss any good ones?):
-O3 -ipo -static -unroll-aggressive -fp-model precise -fp-model source -openmp -std=gnu99 -Wall -Wextra -Wno-unused -Winline -pedantic
Thanks!
n
as an explicit function parameter might improve the odds. – molbdniloinline
functions are being inlined by the compiler? By the way I thought this article interesting though a bit old, Dr. Dobbs - The New C: Inline Functions describing some of the compiler actions. – Richard Chambers