15
votes

I came across a very funny situation where comparing a nullable type to null inside a generic method is 234x slower than comparing an value type or a reference type. The code is as follows:

static bool IsNull<T>(T instance)
{
    return instance == null;
}

The execution code is:

int? a = 0;
string b = "A";
int c = 0;

var watch = Stopwatch.StartNew();

for (int i = 0; i < 1000000; i++)
{
    var r1 = IsNull(a);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r2 = IsNull(b);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r3 = IsNull(c);
}

watch.Stop();

Console.WriteLine(watch.Elapsed.ToString());
Console.ReadKey();

The output for the code above is:

00:00:00.1879827

00:00:00.0008779

00:00:00.0008532

As you can see, comparing an nullable int to null is 234x slower than comparing an int or a string. If I add a second overload with the right constraints, the results change dramatically:

static bool IsNull<T>(T? instance) where T : struct
{
    return instance == null;
}

Now the results are:

00:00:00.0006040

00:00:00.0006017

00:00:00.0006014

Why is that? I didn't check the byte code because I'm not fluent on it, but even if the byte code was a little bit different, I would expect the JIT to optimize this, and it is not (I'm running with optimizations).

3
With such results - under 0.2 seconds for 1M iterations in the worst case, does it matter?Oded
Yes it does if you do this 1M a second. I do.Diego Frata
Don't forget that you are measuring the sum of the cost of the million iterations and the cost of jitting the code on the first call. If the code is really cheap, as this code is, the jit cost that only happens once can actually dominate the average. It might be interesting to run the test twice in the same program so that the second time, the code is "hot".Eric Lippert
Thanks Eric, I'll follow these guidelines next time!Diego Frata

3 Answers

14
votes

Here's what you should do to investigate this.

Start by rewriting the program so that it does everything twice. Put a message box between the two iterations. Compile the program with optimizations on, and run the program not in the debugger. This ensures that the jitter generates the most optimal code it can. The jitter knows when a debugger is attached and can generate worse code to make it easier to debug if it thinks that's what you're doing.

When the message box pops up, attach the debugger and then trace at the assembly code level into the three different versions of the code, if in fact there even are three different versions. I'd be willing to bet as much as a dollar that no code whatsoever is generated for the first one, because the jitter knows that the whole thing can be optimized away to "return false", and then that return false can be inlined, and perhaps even the loop can be removed.

(In the future, you should probably consider this when writing performance tests. Remember that if you don't use the result then the jitter is free to completely optimize away everything that produces that result, as long as it has no side effect.)

Once you can look at the assembly code you'll see what's going on.

I have not investigated this myself personally, but odds are good that what is going on is this:

  • in the int codepath, the jitter is realizing that a boxed int is never null and turning the method into "return false"

  • in the string codepath, the jitter is realizing that testing a string for nullity is equivalent to testing whether the managed pointer to the string is zero, so it is generating a single instruction that tests whether a register is zero.

  • in the int? codepath, probably the jitter is realizing that testing an int? for nullity can be accomplished by boxing the int? -- since a boxed null int is a null reference, that then reduces to the earlier problem of testing a managed pointer against zero. But you take on the cost of the boxing.

If that's the case then the jitter could be more sophisticated here and realize that testing an int? for null can be accomplished by returning the inverse of the HasValue bool inside the int?.

But like I said, that's just a guess. Generate the code yourself and see what it's doing if you're interested.

5
votes

If you compare the IL produced by the two overloads, you can see that there is boxing involved:

The first looks like:

.method private hidebysig static bool IsNull<T>(!!T instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: box !!T
    L_0007: ldnull 
    L_0008: ceq 
    L_000a: stloc.0 
    L_000b: br.s L_000d
    L_000d: ldloc.0 
    L_000e: ret 
}

While the second looks like:

.method private hidebysig static bool IsNull<valuetype ([mscorlib]System.ValueType) .ctor T>(valuetype [mscorlib]System.Nullable`1<!!T> instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarga.s instance
    L_0003: call instance bool [mscorlib]System.Nullable`1<!!T>::get_HasValue()
    L_0008: ldc.i4.0 
    L_0009: ceq 
    L_000b: stloc.0 
    L_000c: br.s L_000e
    L_000e: ldloc.0 
    L_000f: ret 
}

In the second case, the compiler knows the type is a Nullable so it can optimize for that. In the first case, it has to handle any type, both reference and value types. So it has to jump through some extra hoops.

As for why int is faster than int?, I'd imagine there are some JIT optimizations involved there.

3
votes

Boxing and unboxing is happening there without you knowing it, and boxing operations are notoriously slow. It's because you are, in the background, converting nullable reference types to value types.