Why does the compiler not optimize this initialization?

Question

Consider the following C code:

extern void foo(int* ip);

void myfunc(void)
{
    int arr[15] = {0};
    for (int i=0; i<10; i++)
    {
        arr[i] = 42;
    }

    foo(arr);
}

I tried with gcc and clang, with -O3 and -Os. In all cases the compiled assembly writes all 15 zeroes before overwriting 10 of them with 42.

I suppose it could just be that no optimization has been written for this case yet, but it seems like a fairly obvious and common case to me. Is there something that prevents the optimization?

I am on x86-32 Linux and used these commands:

gcc -std=c99 -S -O3 hello.c
clang -std=c99 -S -O3 hello.c

It's always possible to defeat the compiler by writing idiotic code - you'd need an infinitely complex compiler to handle all possible inefficiencies that an inept programmer might introduce. — Paul R
This is a deliberately simplified example, but I think it is fairly common to first zero-initialize a struct and then write to some of the fields. — Tor Klingberg
Why was this question out on hold? I am not asking for any opinions. — Tor Klingberg
@TorKlingberg Any answer to this question would probably be based on an individual opinion of why gcc/clang don't use this optimization. — Daniel Kleinstein
array is a very conservative data type i think. if you announce int arr[100000000]; are you expecting compiler to keep track of all the elements? — Jason Hu

Basile Starynkevitch Basile Starynkevitch · Accepted Answer · 2015-03-11T19:30:15

It is not a very scientific explanation, but just an intuition (however, I do happen to know some of the internals of GCC).

To reliably do the optimization that you want, the compiler would have to manage sub-arrays or slices. Then it is becoming very complex and error-prone. A compiler optimizing that much is likely to consume a lot of memory (for the symbolic representations of sub-arrays) and a lot of compiler time. This is usually not worth the effort (which would be better spent inside the compiler to optimize loops).

BTW, GCC has a plugin framework and the MELT extension (MELT is a lispy domain specific language to extend GCC, and I am the main author of MELT). So you could try to add a new optimization pass (thru a MELT extension or some C++ plugin) doing the work. You'll soon realize that your pass would be either extraordinarily specific or would require handling a big lot of GCC internal representations, and is likely to blow up compilation time and memory for very few gains.

Notice that both GCC and Clang are cleverly unrolling the two loops (and that matters a lot performance-wise).

BTW, the Frama-C (a static analyzer for C programs developed by colleagues) value analyzer seems able to infer good properties about your arr

So, feel free to add that optimization to GCC. If you don't know (or don't have the time - many months or years) how to add it, feel free to pay a company or an organization able to enhance GCC for your needs. It is probably a million euro (or US dollars) / 3 years project to get that optimization working on interesting cases.

^{If you are serious about spending such amount of money, contact me by email.}

A compiler which has such optimization would need some heuristics to disable them (e.g. if arr was a million-member array, and you was coding some sieve of Erasthothenes, it is probably not worth the compiler effort to keep all the unions of sub-slices of composite indexes at compile-time).

BTW, would you accept a twenty-times slower optimizing compiler (slower at compile time) for a gain (probably of a fraction of percent at run-time) which happens rarely in practice and is not very important? At last, I don't believe that this is a common case for optimizations. YMMV.

You might be perhaps interested by source to source transformers like PIPS4U

Why does the compiler not optimize this initialization?

1 Answers