TLDR: if you only use lambda to convert it to a function pointer (and only invoke it via that function pointer), it is always profitable to omit the closure object. Optimizations which enable this are inlining and dead code elimination. If you do use the lambda itself, it is still possible to optimize the closure away, but requires somewhat more aggressive interprocedural optimization.
I will now try to show how that works under the hood. I will use GCC in my examples, because I'm more familiar with it. Other compilers should do something similar.
Consider this code:
#include <stdio.h>
typedef int (* fnptr_t)(int);
void use_fnptr(fnptr_t fn)
{
printf("fn=%p, fn(1)=%d\n", fn, fn(1));
}
int main()
{
auto lam = [] (int x) { return x + 1; };
use_fnptr((fnptr_t)lam);
}
Now, I compile it and dump intermediate representation (for versions prior to 6, you should add -std=c++11
):
g++ test.cc -fdump-tree-ssa
A little cleaned-up and edited (for brevity) dump looks like this:
// _ZZ4mainENKUliE_clEi
main()::<lambda(int)> (const struct __lambda0 * const __closure, int x)
{
return x_1(D) + 1;
}
// _ZZ4mainENUliE_4_FUNEi
static int main()::<lambda(int)>::_FUN(int) (int D.2780)
{
return main()::<lambda(int)>::operator() (0B, _2(D));
}
// _ZZ4mainENKUliE_cvPFiiEEv
main()::<lambda(int)>::operator int (*)(int)() const (const struct __lambda0 * const this)
{
return _FUN;
}
int main() ()
{
struct __lambda0 lam;
int (*<T5c1>) (int) _3;
_3 = main()::<lambda(int)>::operator int (*)(int) (&lam);
use_fnptr (_3);
}
That is, lambda has 2 member functions: function call operator and a conversion operator and one static member function _FUN, which simply invokes lambda with this
set to zero. main
calls the conversion operator and passes the result to use_fnptr - exactly as it is written in the source code.
I can write:
extern "C" int _ZZ4mainENKUliE_clEi(void *, int);
int main()
{
auto lam = [] (int x) { return x + 1; };
use_fnptr((fnptr_t)lam);
printf("%d %d %d\n", lam(10), _ZZ4mainENKUliE_clEi(&lam, 11), __lambda0::_FUN(12));
printf("%p %p\n", &__lambda0::_FUN, (fnptr_t)lam);
return 0;
}
This program outputs:
fn=0x4005fc, fn(1)=2
11 12 13
0x4005fc 0x4005fc
Now, I think it's pretty obvious, that the compiler should inline lambda (_ZZ4mainENKUliE_clEi) into _FUN
(_ZZ4mainENUliE_4_FUNEi), because _FUN
is the only caller. And inline operator int (*)(int)
into main
(because this operator simply returns a constant). GCC does exactly this, when compiling with optimization (-O). You can check this like:
g++ test.cc -O -fdump-tree-einline
Dump file:
// Considering inline candidate main()::<lambda(int)>.
// Inlining main()::<lambda(int)> into static int main():<lambda(int)>::_FUN(int).
static int main()::<lambda(int)>::_FUN(int) (int D.2822)
{
return _2(D) + 1;
}
The closure object is gone. Now, a more complicated case, when lambda itself is used (not a function pointer). Consider:
#include <stdio.h>
#define PRINT(x) printf("%d", (x))
#define PRINT1(x) PRINT(x); PRINT(x); PRINT(x); PRINT(x);
#define PRINT2(x) do { PRINT1(x) PRINT1(x) PRINT1(x) PRINT1(x) } while(0)
__attribute__((noinline)) void use_lambda(auto t)
{
t(1);
}
int main()
{
auto lam = [] (int x) { PRINT2(x); };
use_lambda(lam);
return 0;
}
GCC will not inline lambda, because it is rather huge (that is what I used printf's for):
g++ test2.cc -O2 -fdump-ipa-inline -fdump-tree-einline -fdump-tree-esra
Early inliner's dump:
Considering inline candidate main()::<lambda(int)>
will not early inline: void use_lambda(auto:1) [with auto:1 = main()::<lambda(int)>]/16->main()::<lambda(int)>/19, growth 46 exceeds --param early-inlining-insns
But "early interprocedural scalar replacement of aggregates" pass will do what we want:
;; Function main()::<lambda(int)> (_ZZ4mainENKUliE_clEi, funcdef_no=14, decl_uid=2815, cgraph_uid=12, symbol_order=12)
IPA param adjustments: 0. base_index: 0 - __closure, base: __closure, remove_param
1. base_index: 1 - x, base: x, copy_param
The first parameter (i.e., closure) is not used, and it gets removed. Unfortunately interprocedural SRA is not able to optimize away indirection, which is introduced for captured values (though there are cases when it would be obviously profitable), so there is still some room for enhancements.