Why does adding local variables make .NET code slower

Question

Why does commenting out the first two lines of this for loop and uncommenting the third result in a 42% speedup?

int count = 0;
for (uint i = 0; i < 1000000000; ++i) {
    var isMultipleOf16 = i % 16 == 0;
    count += isMultipleOf16 ? 1 : 0;
    //count += i % 16 == 0 ? 1 : 0;
}

Behind the timing is vastly different assembly code: 13 vs. 7 instructions in the loop. The platform is Windows 7 running .NET 4.0 x64. Code optimization is enabled, and the test app was run outside VS2010. [Update: Repro project, useful for verifying project settings.]

Eliminating the intermediate boolean is a fundamental optimization, one of the simplest in my 1980's era Dragon Book. How did the optimization not get applied when generating the CIL or JITing the x64 machine code?

Is there a "Really compiler, I would like you to optimize this code, please" switch? While I sympathize with the sentiment that premature optimization is akin to the love of money, I could see the frustration in trying to profile a complex algorithm that had problems like this scattered throughout its routines. You'd work through the hotspots but have no hint of the broader warm region that could be vastly improved by hand tweaking what we normally take for granted from the compiler. I sure hope I'm missing something here.

Update: Speed differences also occur for x86, but depend on the order that methods are just-in-time compiled. See Why does JIT order affect performance?

Assembly code (as requested):

    var isMultipleOf16 = i % 16 == 0;
00000037  mov         eax,edx 
00000039  and         eax,0Fh 
0000003c  xor         ecx,ecx 
0000003e  test        eax,eax 
00000040  sete        cl 
    count += isMultipleOf16 ? 1 : 0;
00000043  movzx       eax,cl 
00000046  test        eax,eax 
00000048  jne         0000000000000050 
0000004a  xor         eax,eax 
0000004c  jmp         0000000000000055 
0000004e  xchg        ax,ax 
00000050  mov         eax,1 
00000055  lea         r8d,[rbx+rax]

    count += i % 16 == 0 ? 1 : 0;
00000037  mov         eax,ecx 
00000039  and         eax,0Fh 
0000003c  je          0000000000000042 
0000003e  xor         eax,eax 
00000040  jmp         0000000000000047 
00000042  mov         eax,1 
00000047  lea         edx,[rbx+rax]

I'd be curious to see the different assembly code. Could you post it? — phoog
@David.Chu.ca - that wouldn't make a difference - var is "compiler, please infer the type of this variable, and pretend I wrote that instead". In this case, it will have inferred bool for itself. — Damien_The_Unbeliever
@EdwardBrey: Since you did this in Debug mode all bets are off — BrokenGlass
@EdwardBrey: I can't find a source at the moment, but I believe the jitter and/or other optimizer settings are different if you have a debugger attached at all (that is, if you're running from Visual Studio, even if you compiled in "Release" mode). Try running your code from the command line (not from VS) and see what happens. — Daniel Pryden

Maciej Maciej · Accepted Answer · 2012-04-30T09:46:59

Question should be "Why do I see such a difference on my machine?". I cannot reproduce such a huge speed difference and suspect there is something specific to your environment. Very difficult to tell what it can be though. Can be some (compiler) options you have set some time ago and forgot about them.

I have create a console application, rebuild in Release mode (x86) and run outside VS. Results are virtually identical, 1.77 seconds for both methods. Here is the exact code:

static void Main(string[] args)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    int count = 0;

    for (uint i = 0; i < 1000000000; ++i)
    {
        // 1st method
        var isMultipleOf16 = i % 16 == 0;
        count += isMultipleOf16 ? 1 : 0;

        // 2nd method
        //count += i % 16 == 0 ? 1 : 0;
    }

    sw.Stop();
    Console.WriteLine(string.Format("Ellapsed {0}, count {1}", sw.Elapsed, count));
    Console.ReadKey();
}

Please, anyone who has 5 minutes copy the code, rebuild, run outside VS and post results in comments to this answer. I'd like to avoid saying "it works on my machine".

EDIT

To be sure I have created a 64 bit Winforms application and the results are similar as in the the question - the first method is slower (1.57 sec) than the second one (1.05 sec). The difference I observe is 33% - still a lot. Seems there is a bug in .NET4 64 bit JIT compiler.

Why does adding local variables make .NET code slower

5 Answers