Checking for string contents? string Length Vs Empty String

18

votes

Which is more efficient for the compiler and the best practice for checking whether a string is blank?

Checking whether the length of the string == 0
Checking whether the string is empty (strVar == "")

Also, does the answer depend on language?

stringoptimizationlanguage-agnosticcompiler-construction

18

votes

Yes, it depends on language, since string storage differs between languages.

Pascal-type strings: Length = 0.
C-style strings: [0] == 0.
.NET: .IsNullOrEmpty.

Etc.

16

votes

In languages that use C-style (null-terminated) strings, comparing to "" will be faster. That's an O(1) operation, while taking the length of a C-style string is O(n).

In languages that store length as part of the string object (C#, Java, ...) checking the length is also O(1). In this case, directly checking the length is faster, because it avoids the overhead of constructing the new empty string.

3

votes

In languages that use C-style (null-terminated) strings, comparing to "" will be faster

Actually, it may be better to check if the first char in the string is '\0':

char *mystring;
/* do something with the string */
if ((mystring != NULL) && (mystring[0] == '\0')) {
    /* the string is empty */
}

In Perl there's a third option, that the string is undefined. This is a bit different from a NULL pointer in C, if only because you don't get a segmentation fault for accessing an undefined string.

2

votes

In .Net:

string.IsNullOrEmpty( nystr );

strings can be null, so .Length sometimes throws a NullReferenceException

2

votes

String.IsNullOrEmpty() only works on .net 2.0 and above, for .net 1/1.1, I tend to use:

if (inputString == null || inputString == String.Empty)
{
    // String is null or empty, do something clever here. Or just expload.
}

I use String.Empty as opposed to "" because "" will create an object, whereas String.Empty wont - I know its something small and trivial, but id still rather not create objects when I dont need them! (Source)

1

votes

For C strings,

if (s[0] == 0)

will be faster than either

if (strlen(s) == 0)

or

if (strcmp(s, "") == 0)

because you will avoid the overhead of a function call.

1

votes

Assuming your question is .NET:

If you want to validate your string against nullity as well use IsNullOrEmpty, if you know already that your string is not null, for example when checking TextBox.Text etc., do not use IsNullOrEmpty, and then comes in your question.
So for my opinion String.Length is less perfomance than string comparison.

I event tested it (I also tested with C#, same result):

Module Module1
  Sub Main()
    Dim myString = ""


    Dim a, b, c, d As Long

    Console.WriteLine("Way 1...")

    a = Now.Ticks
    For index = 0 To 10000000
      Dim isEmpty = myString = ""
    Next
    b = Now.Ticks

    Console.WriteLine("Way 2...")

    c = Now.Ticks
    For index = 0 To 10000000
      Dim isEmpty = myString.Length = 0
    Next
    d = Now.Ticks

    Dim way1 = b - a, way2 = d - c

    Console.WriteLine("way 1 took {0} ticks", way1)
    Console.WriteLine("way 2 took {0} ticks", way2)
    Console.WriteLine("way 1 took {0} ticks more than way 2", way1 - way2)
    Console.Read()
  End Sub
End Module

Result:

Way 1...
Way 2...
way 1 took 624001 ticks
way 2 took 468001 ticks
way 1 took 156000 ticks more than way 2

Which means comparison takes way more than string length check.

1

votes

After I read this thread, I conducted a little experiment, which yielded two distinct, and interesting, findings.

Consider the following.

strInstallString    "1" string

The above is copied from the locals window of the Visual Studio debugger. The same value is used in all three of the following examples.

if ( strInstallString == "" ) === if ( strInstallString == string.Empty )

Following is the code displayed in the disassembly window of the Visual Studio 2013 debugger for these two fundamentally identical cases.

if ( strInstallString == "" )
003126FB  mov         edx,dword ptr ds:[31B2184h]
00312701  mov         ecx,dword ptr [ebp-50h]
00312704  call        59DEC0B0            ; On return, EAX = 0x00000000.
00312709  mov         dword ptr [ebp-9Ch],eax
0031270F  cmp         dword ptr [ebp-9Ch],0
00312716  sete        al
00312719  movzx       eax,al
0031271C  mov         dword ptr [ebp-64h],eax
0031271F  cmp         dword ptr [ebp-64h],0
00312723  jne         00312750

if ( strInstallString == string.Empty )
00452443  mov         edx,dword ptr ds:[3282184h]
00452449  mov         ecx,dword ptr [ebp-50h]
0045244C  call        59DEC0B0        ; On return, EAX = 0x00000000.
00452451  mov         dword ptr [ebp-9Ch],eax
00452457  cmp         dword ptr [ebp-9Ch],0
0045245E  sete        al
00452461  movzx       eax,al
00452464  mov         dword ptr [ebp-64h],eax
00452467  cmp         dword ptr [ebp-64h],0
0045246B  jne         00452498

if ( strInstallString == string.Empty ) Isn't Significantly Different

if ( strInstallString.Length == 0 )
003E284B  mov         ecx,dword ptr [ebp-50h]
003E284E  cmp         dword ptr [ecx],ecx
003E2850  call        5ACBC87E        ; On return, EAX = 0x00000001.
003E2855  mov         dword ptr [ebp-9Ch],eax
003E285B  cmp         dword ptr [ebp-9Ch],0
003E2862  setne       al
003E2865  movzx       eax,al
003E2868  mov         dword ptr [ebp-64h],eax
003E286B  cmp         dword ptr [ebp-64h],0
003E286F  jne         003E289C

From the above machine code listings, generated by the NGEN module of the .NET Framework, version 4.5, I draw the following conclusions.

Testing for equality against the empty string literal and the static string.Empty property on the System.string class are, for all practical purposes, identical. The only difference between the two code snippets is the source of the first move instruction, and both are offsets relative to ds, implying that both refer to baked-in constants.
Testing for equality against the empty string, as either a literal or the string.Empty property, sets up a two-argument function call, which indicates inequality by returning zero. I base this conclusion on other tests that I performed a couple of months ago, in which I followed some of my own code across the managed/unmanaged divide and back. In all cases, any call that required two or more arguments put the first argument in register ECX, and and the second in register EDX. I don't recall how subsequent arguments were passed. Nevertheless, the call setup looked more like __fastcall than __stdcall. Likewise, the expected return values always showed up in register EAX, which is almost universal.
Testing the length of the string sets up a one-argument function call, which returns 1 (in register EAX), which happens to be the length of the string being tested.
Given that the immediately visible machine code is almost identical, the only reason that I can imagine that would account for the better performance of the string equality over the sting length reported by Shinny is that the two-argument function that performs the comparison is significantly better optimized than the one-argument function that reads the length off the string instance.

Conclusion

As a matter of principle, I avoid comparing against the empty string as a literal, because the empty string literal can appear ambiguous in source code. To that end, my .NET helper classes have long defined the empty string as a constant. Though I use string.Empty for direct, inline comparisons, the constant earns its keep for defining other constants whose value is the empty string, because a constant cannot be assigned string.Empty as its value.

This exercise settles, once and for all, any concern I might have about the cost, if any, of comparing against either string.Empty or the constant defined by my helper classes.

However, it also raises a puzzling question to replace it; why is comparing against string.Empty more efficient than testing the length of the string? Or is the test used by Shinny invalidated because by the way the loop is implemented? (I find that hard to believe, but, then again, I've been fooled before, as I'm sure you have, too!)

I have long assumed that system.string objects were counted strings, fundamentally similar to the long established Basic String (BSTR) that we have long known from COM.

1

votes

In Java 1.6, the String class has a new method [isEmpty] 1

There is also the Jakarta commons library, which has the [isBlank] 2 method. Blank is defined as a string that contains only whitespace.

0

votes

Actually, IMO the best way to determine is the IsNullOrEmpty() method of the string class.

http://msdn.microsoft.com/en-us/library/system.string.isnullorempty.

Update: I assumed .Net, in other languages, this might be different.

0

votes

In this case, directly checking the length is faster, because it avoids the overhead of constructing the new empty string.

@DerekPark: That's not always true. "" is a string literal so, in Java, it will almost certainly already be interned.

0

votes

@Nathan

Actually, it may be better to check if the first char in the string is '\0':

I almost mentioned that, but ended up leaving it out, since calling strcmp() with the empty string and directly checking the first character in the string are both O(1). You basically just pay for an extra function call, which is pretty cheap. If you really need the absolute best speed, though, definitely go with a direct first-char-to-0 comparison.

Honestly, I always use strlen() == 0, because I have never written a program where this was actually a measurable performance issue, and I think that's the most readable way to express the check.

0

votes

Again, without knowing the language, it's impossible to tell.

However, I recommend that you choose the technique that makes the most sense to the maintenance programmer that follows and will have to maintain your work.

I'd recommend writing a function that explicitly does what you want, such as

#define IS_EMPTY(s) ((s)[0]==0)

or comparable. Now there's no doubt at is you're checking.

Checking for string contents? string Length Vs Empty String

13 Answers