3
votes

I have two situations:

    static void CreateCopyOfString()
    {
        string s = "Hello";
        ProcessString(s);
    }

and

    static void DoNotCreateCopyOfString()
    {
        ProcessString("Hello");
    }

The IL for these two situations looks like this:

    .method private hidebysig static void  CreateCopyOfString() cil managed
    {
        // Code size       15 (0xf)
        .maxstack  1
        .locals init ([0] string s)
        IL_0000:  nop
        IL_0001:  ldstr      "Hello"
        IL_0006:  stloc.0
        IL_0007:  ldloc.0
        IL_0008:  call       void ConsoleApplication1.Program::ProcessString(string)
        IL_000d:  nop
        IL_000e:  ret
    } // end of method Program::CreateCopyOfString

and

    .method private hidebysig static void  DoNotCreateCopyOfString() cil managed
    {
          // Code size       13 (0xd)
          .maxstack  8
          IL_0000:  nop
          IL_0001:  ldstr      "Hello"
          IL_0006:  call       void ConsoleApplication1.Program::ProcessString(string)
          IL_000b:  nop
          IL_000c:  ret
    } // end of method Program::DoNotCreateCopyOfString

In the first case, there is extra calls for string init, stloc.0 and ldloc.0. Does this mean that the first case would perform weaker than the second case where the string is directly passed to the method instead of first storing it in the local variable?

I saw the question Does initialization of local variable with null impacts performance? but it seems to be little bit different than what I need to know here. Thanks.

2
How did you compiled the code? Was it in Debug or Release mode? I believe, that in Release both IL would look exactly the same.MarcinJuraszek
It was compiled in debug. Let me check with release mode.ashtee
Same result with release build.ashtee

2 Answers

11
votes

You're looking at the unoptimized IL, for one thing - hence all the "nop"s. You may find it generates different code when building the Release version.

Even with the unoptimized version, if you're running under an optimizing JIT, I'd expect it to end up with the same JITted code.

Even with a non-optimizing JIT which did actually generate code which did more work each time this is called, I'd be staggered to see this have a significant impact in any real application.

As ever:

  • Set performance goals before you start, and measure against them.
  • Work out which decisions will be hard to fix later in terms of performance, and worry about those much more than decisions like this, which can be changed later with no impact elsewhere.
  • Write the most simplest, most readable code that will work first.
  • If that doesn't perform well enough, investigate whether making changes which harm readability help performance enough to warrant the pain.
1
votes

No, it will not impact performance. You can confirm this by verifying that the machine code produced for both are the same. Note that in an optimized JIT, ProcessString may be inlined. To avoid this, you may add [MethodImpl(MethodImplOptions.NoInlining)]. Compile an optimized (Release) build.

  1. Open the executable in WinDbg. Use matching 32 or 64-bit version depending on your EXE.
  2. Type sxe ld clrjit to break when clrjit.dll is loaded. Type g to continue until break.
  3. Load SOS with .loadby sos clr. Note that for earlier CLR versions, you need to use mscorwks instead of clr.
  4. Find address of method table with !name2ee * <full class name>.
  5. Type !dumpmt -md <address of MethoTable> to dump method details. Notice at this time that CreateCopyOfString and DoNotCreateCopyOfString is not yet JITed.
  6. Type !bpmd <full class name>.CreateCopyOfString and !bpmd <full class name>.DoNotCreateCopyOfString to break when method is called. Type g to continue. Could also use !bpmd -md <address of MethodDesc> to set breakpoints.
  7. When breakpoint is hit, type !u <address of MethodDesc> to dump the machine code for the method.

Note that when I tried this, only one of the methods was JITed, presumably because the runtime determined that the two methods were identically and JITing the other was unnecessary. As such, I commented out the call as appropriate and repeated to get the machine code.

The actual registers and addresses will vary, but both methods resulted with the following machine code:

sub     rsp,28h
mov     rcx,121E3258h
mov     rcx,qword ptr [rcx]
call    000007fe`9852c038
nop
add     rsp,28h
ret

Hence, you may conclude that since the same machine code is being executed the performance of either method will be the same.