In C, I have a task where I must do multiplication, inversion, trasposition, addition etc. etc. with huge matrices allocated as 2-dimensional arrays, (arrays of arrays).
I have found the gcc flag -funroll-all-loops
. If I understand correctly, this will unroll all loops automatically without any efforts by the programmer.
My questions:
a) Does gcc include this kind of optimization with the various optimization flags as -O1
, -O2
etc.?
b) Do I have to use any pragma
s inside my code to take advantage of loop unrolling or are loops identified automatically?
c) Why is this option not enabled by default if the unrolling increases the performance?
d) What are the recommended gcc optimization flags to compile the program in the best way possible? (I must run this program optimized for a single CPU family, that is the same of the machine where I compile the code, actually I use march=native
and -O2
flags)
EDIT
Seems that there are controversities about the use of unroll that in some cases may slow down the performance. In my situations there are various methods that do simply math operations in 2 nested for cycles for iterate matrix elements done for an huge amount of elements. In this scenario how unroll could slow down or increase the performance?
funroll-all-loops: ...This usually makes programs run more slowly.
. You can hit instruction cache misses and your code size will increase. It's not an automatic benefit. – Ed S.-O
options add-funroll-loops
or-funroll-all-loops
. – IllusiveBrian