Why does commenting out the first two lines of this for loop and uncommenting the third result in a 42% speedup?
int count = 0;
for (uint i = 0; i < 1000000000; ++i) {
var isMultipleOf16 = i % 16 == 0;
count += isMultipleOf16 ? 1 : 0;
//count += i % 16 == 0 ? 1 : 0;
}
Behind the timing is vastly different assembly code: 13 vs. 7 instructions in the loop. The platform is Windows 7 running .NET 4.0 x64. Code optimization is enabled, and the test app was run outside VS2010. [Update: Repro project, useful for verifying project settings.]
Eliminating the intermediate boolean is a fundamental optimization, one of the simplest in my 1980's era Dragon Book. How did the optimization not get applied when generating the CIL or JITing the x64 machine code?
Is there a "Really compiler, I would like you to optimize this code, please" switch? While I sympathize with the sentiment that premature optimization is akin to the love of money, I could see the frustration in trying to profile a complex algorithm that had problems like this scattered throughout its routines. You'd work through the hotspots but have no hint of the broader warm region that could be vastly improved by hand tweaking what we normally take for granted from the compiler. I sure hope I'm missing something here.
Update: Speed differences also occur for x86, but depend on the order that methods are just-in-time compiled. See Why does JIT order affect performance?
Assembly code (as requested):
var isMultipleOf16 = i % 16 == 0;
00000037 mov eax,edx
00000039 and eax,0Fh
0000003c xor ecx,ecx
0000003e test eax,eax
00000040 sete cl
count += isMultipleOf16 ? 1 : 0;
00000043 movzx eax,cl
00000046 test eax,eax
00000048 jne 0000000000000050
0000004a xor eax,eax
0000004c jmp 0000000000000055
0000004e xchg ax,ax
00000050 mov eax,1
00000055 lea r8d,[rbx+rax]
count += i % 16 == 0 ? 1 : 0;
00000037 mov eax,ecx
00000039 and eax,0Fh
0000003c je 0000000000000042
0000003e xor eax,eax
00000040 jmp 0000000000000047
00000042 mov eax,1
00000047 lea edx,[rbx+rax]
var
is "compiler, please infer the type of this variable, and pretend I wrote that instead". In this case, it will have inferredbool
for itself. – Damien_The_Unbeliever