112
votes

For gcc, the manual explains what -O3, -Os, etc. translate to in terms of specific optimisation arguments (-funswitch-loops, -fcompare-elim, etc.)

I'm looking for the same info for clang.

I've looked online and in man clang which only gives general information (-O2 optimises more aggressively than -O1, -Os optimises for size, ...) and also looked here on Stack Overflow and found this, but I haven't found anything relevant in the cited source files.

Edit: I found an answer but I'm still interested if anyone has a link to a user-manual documenting all optimisation passes and the passes selected by -Ox. Currently I just found this list of passes, but nothing on optimisation levels.

3

3 Answers

181
votes

I found this related question.

To sum it up, to find out about compiler optimization passes:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

As pointed out in Geoff Nixon's answer (+1), clang additionally runs some higher level optimizations, which we can retrieve with:

echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#

Documentation of individual passes is available here.

You can compare the effect of changing high-level flags such as -O like this:

diff -wy --suppress-common-lines  \
  <(echo 'int;' | clang -xc     - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp) \
  <(echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp)
# will tell you that -O0 is indeed the default.

With version 6.0 the passes are as follow:

  • baseline (-O0):

  • opt sets: -tti -verify -ee-instrument -targetlibinfo -assumption-cache-tracker -profile-summary-info -forceattrs -basiccg -always-inline -barrier

  • clang adds : -mdisable-fp-elim -mrelax-all

  • -O1 is based on -O0

  • opt adds: -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -called-value-propagation -globalopt -domtree -mem2reg -deadargelim -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -always-inline -functionattrs -sroa -memoryssa -early-cse-memssa -speculative-execution -lazy-value-info -jump-threading -correlated-propagation -libcalls-shrinkwrap -branch-prob -block-freq -pgo-memop-opt -tailcallelim -reassociate -loop-simplify -lcssa-verification -lcssa -scalar-evolution -loop-rotate -licm -loop-unswitch -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -demanded-bits -bdce -dse -postdomtree -adce -barrier -rpo-functionattrs -globaldce -float2int -loop-accesses -loop-distribute -loop-vectorize -loop-load-elim -alignment-from-assumptions -strip-dead-prototypes -loop-sink -instsimplify -div-rem-pairs -verify -ee-instrument -early-cse -lower-expect

  • clang adds : -momit-leaf-frame-pointer

  • clang drops : -mdisable-fp-elim -mrelax-all

  • -O2 is based on -O1

  • opt adds: -inline -mldst-motion -gvn -elim-avail-extern -slp-vectorizer -constmerge

  • opt drops: -always-inline

  • clang adds: -vectorize-loops -vectorize-slp

  • -O3 is based on -O2

  • opt adds: -callsite-splitting -argpromotion

  • -Ofast is based on -O3, valid in clang but not in opt

  • clang adds: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs -mreassociate -fno-trapping-math -ffast-math -ffinite-math-only

  • -Os is similar to -O2

  • opt drops: -libcalls-shrinkwrap and -pgo-memopt-opt

  • -Oz is based on -Os

  • opt drops: -slp-vectorizer


With version 3.8 the passes are as follow:

  • baseline (-O0):

  • opt sets : -targetlibinfo -tti -verify

  • clang adds : -mdisable-fp-elim -mrelax-all

  • -O1 is based on -O0

  • opt adds: -globalopt -demanded-bits -branch-prob -inferattrs -ipsccp -dse -loop-simplify -scoped-noalias -barrier -adce -deadargelim -memdep -licm -globals-aa -rpo-functionattrs -basiccg -loop-idiom -forceattrs -mem2reg -simplifycfg -early-cse -instcombine -sccp -loop-unswitch -loop-vectorize -tailcallelim -functionattrs -loop-accesses -memcpyopt -loop-deletion -reassociate -strip-dead-prototypes -loops -basicaa -correlated-propagation -lcssa -domtree -always-inline -aa -block-freq -float2int -lower-expect -sroa -loop-unroll -alignment-from-assumptions -lazy-value-info -prune-eh -jump-threading -loop-rotate -indvars -bdce -scalar-evolution -tbaa -assumption-cache-tracker

  • clang adds : -momit-leaf-frame-pointer

  • clang drops : -mdisable-fp-elim -mrelax-all

  • -O2 is based on -O1

  • opt adds: -elim-avail-extern -mldst-motion -slp-vectorizer -gvn -inline -globaldce -constmerge

  • opt drops: -always-inline

  • clang adds: -vectorize-loops -vectorize-slp

  • -O3 is based on -O2

  • opt adds: -argpromotion

  • -Ofast is based on -O3, valid in clang but not in opt

  • clang adds: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs

  • -Os is the same as -O2

  • -Oz is based on -Os

  • opt drops: -slp-vectorizer

  • clang drops: -vectorize-loops


With version 3.7 the passes are as follow (parsed output of the command above):

  • default (-O0): -targetlibinfo -verify -tti

  • -O1 is based on -O0

  • adds: -sccp -loop-simplify -float2int -lazy-value-info -correlated-propagation -bdce -lcssa -deadargelim -loop-unroll -loop-vectorize -barrier -memcpyopt -loop-accesses -assumption-cache-tracker -reassociate -loop-deletion -branch-prob -jump-threading -domtree -dse -loop-rotate -ipsccp -instcombine -scoped-noalias -licm -prune-eh -loop-unswitch -alignment-from-assumptions -early-cse -inline-cost -simplifycfg -strip-dead-prototypes -tbaa -sroa -no-aa -adce -functionattrs -lower-expect -basiccg -loops -loop-idiom -tailcallelim -basicaa -indvars -globalopt -block-freq -scalar-evolution -memdep -always-inline

  • -O2 is based on -01

  • adds: -elim-avail-extern -globaldce -inline -constmerge -mldst-motion -gvn -slp-vectorizer

  • removes: -always-inline

  • -O3 is based on -O2

  • adds: -argpromotion -verif

  • -Os is identical to -O2

  • -Oz is based on -Os

  • removes: -slp-vectorizer


For version 3.6 the passes are as documented in GYUNGMIN KIM's post.


With version 3.5 the passes are as follow (parsed output of the command above):

  • default (-O0): -targetlibinfo -verify -verify-di

  • -O1 is based on -O0

  • adds: -correlated-propagation -basiccg -simplifycfg -no-aa -jump-threading -sroa -loop-unswitch -ipsccp -instcombine -memdep -memcpyopt -barrier -block-freq -loop-simplify -loop-vectorize -inline-cost -branch-prob -early-cse -lazy-value-info -loop-rotate -strip-dead-prototypes -loop-deletion -tbaa -prune-eh -indvars -loop-unroll -reassociate -loops -sccp -always-inline -basicaa -dse -globalopt -tailcallelim -functionattrs -deadargelim -notti -scalar-evolution -lower-expect -licm -loop-idiom -adce -domtree -lcssa

  • -O2 is based on -01

  • adds: -gvn -constmerge -globaldce -slp-vectorizer -mldst-motion -inline

  • removes: -always-inline

  • -O3 is based on -O2

  • adds: -argpromotion

  • -Os is identical to -O2

  • -Oz is based on -Os

  • removes: -slp-vectorizer


With version 3.4 the passes are as follow (parsed output of the command above):

  • -O0: -targetlibinfo -preverify -domtree -verify

  • -O1 is based on -O0

  • adds: -adce -always-inline -basicaa -basiccg -correlated-propagation -deadargelim -dse -early-cse -functionattrs -globalopt -indvars -inline-cost -instcombine -ipsccp -jump-threading -lazy-value-info -lcssa -licm -loop-deletion -loop-idiom -loop-rotate -loop-simplify -loop-unroll -loop-unswitch -loops -lower-expect -memcpyopt -memdep -no-aa -notti -prune-eh -reassociate -scalar-evolution -sccp -simplifycfg -sroa -strip-dead-prototypes -tailcallelim -tbaa

  • -O2 is based on -01

  • adds: -barrier -constmerge -domtree -globaldce -gvn -inline -loop-vectorize -preverify -slp-vectorizer -targetlibinfo -verify

  • removes: -always-inline

  • -O3 is based on -O2

  • adds: -argpromotion

  • -Os is identical to -O2

  • -Oz is based on -O2

  • removes: -barrier -loop-vectorize -slp-vectorizer


With version 3.2 the passes are as follow (parsed output of the command above):

  • -O0: -targetlibinfo -preverify -domtree -verify

  • -O1 is based on -O0

  • adds: -sroa -early-cse -lower-expect -no-aa -tbaa -basicaa -globalopt -ipsccp -deadargelim -instcombine -simplifycfg -basiccg -prune-eh -always-inline -functionattrs -simplify-libcalls -lazy-value-info -jump-threading -correlated-propagation -tailcallelim -reassociate -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -scalar-evolution -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -dse -adce -strip-dead-prototypes

  • -O2 is based on -01

  • adds: -inline -globaldce -constmerge

  • removes: -always-inline

  • -O3 is based on -O2

  • adds: -argpromotion

  • -Os is identical to -O2

  • -Oz is identical to -Os


Edit [march 2014] removed duplicates from lists.

Edit [april 2014] added documentation link + options for 3.4

Edit [september 2014] added options for 3.5

Edit [december 2015] added options for 3.7 and mention existing answer for 3.6

Edit [may 2016] added options for 3.8, for both opt and clang and mention existing answer for clang (versus opt)

Edit [nov 2018] add options for 6.0

17
votes

@Antoine's answer (and the other question linked) accurately describe the LLVM optimizations that are enabled, but there are a few other Clang-specific options (i.e., those that affect lowering to the AST) that affected by the -O[0|1|2|3|fast] flags.

You can take a look at these with:

echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O1 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -Ofast - -o /dev/null -\#\#\#

For example, -O0 enables -mrelax-all, -O1 enables -vectorize-loops and -vectorize-slp, and -Ofast enables -menable-no-infs, -menable-no-nans, -menable-unsafe-fp-math, -ffp-contract=fast and -ffast-math.


@Techogrebo:

Yes, no don't necessarily need the other LLVM tools. Try:

echo 'int;' | clang -xc - -o /dev/null -mllvm -print-all-options

Also, there are a lot more detailed options you can examine/modify with Clang alone... you just need to know how to get to them!

Try a few of:

clang -help

clang -cc1 -help

clang -cc1 -mllvm -help

clang -cc1 -mllvm -help-list-hidden

clang -cc1as -help

3
votes

LLVM 3.6 -O1

Pass Arguments: -targetlibinfo -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -notti -verify-di -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -always-inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -function_tti -loop-unroll -memdep -memcpyopt -sccp -domtree -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -adce -simplifycfg -domtree -instcombine -barrier -domtree -loops -loop-simplify -lcssa -branch-prob -block-freq -scalar-evolution -loop-vectorize -instcombine -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -scalar-evolution -function_tti -loop-unroll -alignment-from-assumptions -strip-dead-prototypes -verify -verify-di

-O2 base on -O1

add : -inline -mldst-motion -domtree -memdep -gvn -memdep -scalar-evolution -slp-vectorizer -globaldce -constmerge

and removes: -always-inline

-O3 based on -O2

add: -argpromotion