228
votes

It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc...

If the default value of string were the empty string, I would not have to test, and I would feel it to be more consistent with the other value types like int or double for example. Additionally Nullable<String> would make sense.

So why did the designers of C# choose to use null as the default value of strings?

Note: This relates to this question, but is more focused on the why instead of what to do with it.

15
Do you consider this a problem for other reference types?Jon Skeet
@JonSkeet No, but only because I initially, wrongly, thought that strings are value types.Marcel
@Marcel: That's a pretty good reason for wondering about it.T.J. Crowder
@JonSkeet Yes. Oh yes. (But you’re no stranger to the non-nullable reference type discussion …)Konrad Rudolph
@JohnCastle I dare you to ask database developers who understand the value of trinary state if you can take their nulls from them. The reason it was no good was because people don't think in trinary, it's either left or right, up or down, yes or no. Relational algebra NEEDS a trinary state.jcolebrand

15 Answers

321
votes

Why is the default value of the string type null instead of an empty string?

Because string is a reference type and the default value for all reference types is null.

It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc...

That is consistent with the behaviour of reference types. Before invoking their instance members, one should put a check in place for a null reference.

If the default value of string were the empty string, I would not have to test, and I would feel it to be more consistent with the other value types like int or double for example.

Assigning the default value to a specific reference type other than null would make it inconsistent.

Additionally Nullable<String> would make sense.

Nullable<T> works with the value types. Of note is the fact that Nullable was not introduced on the original .NET platform so there would have been a lot of broken code had they changed that rule.(Courtesy @jcolebrand)

40
votes

Habib is right -- because string is a reference type.

But more importantly, you don't have to check for null each time you use it. You probably should throw a ArgumentNullException if someone passes your function a null reference, though.

Here's the thing -- the framework would throw a NullReferenceException for you anyway if you tried to call .ToUpper() on a string. Remember that this case still can happen even if you test your arguments for null since any property or method on the objects passed to your function as parameters may evaluate to null.

That being said, checking for empty strings or nulls is a common thing to do, so they provide String.IsNullOrEmpty() and String.IsNullOrWhiteSpace() for just this purpose.

25
votes

You could write an extension method (for what it's worth):

public static string EmptyNull(this string str)
{
    return str ?? "";
}

Now this works safely:

string str = null;
string upper = str.EmptyNull().ToUpper();
17
votes

You could also use the following, as of C# 6.0

string myString = null;
string result = myString?.ToUpper();

The string result will be null.

14
votes

Empty strings and nulls are fundamentally different. A null is an absence of a value and an empty string is a value that is empty.

The programming language making assumptions about the "value" of a variable, in this case an empty string, will be as good as initiazing the string with any other value that will not cause a null reference problem.

Also, if you pass the handle to that string variable to other parts of the application, then that code will have no ways of validating whether you have intentionally passed a blank value or you have forgotten to populate the value of that variable.

Another occasion where this would be a problem is when the string is a return value from some function. Since string is a reference type and can technically have a value as null and empty both, therefore the function can also technically return a null or empty (there is nothing to stop it from doing so). Now, since there are 2 notions of the "absence of a value", i.e an empty string and a null, all the code that consumes this function will have to do 2 checks. One for empty and the other for null.

In short, its always good to have only 1 representation for a single state. For a broader discussion on empty and nulls, see the links below.

https://softwareengineering.stackexchange.com/questions/32578/sql-empty-string-vs-null-value

NULL vs Empty when dealing with user input

7
votes

The fundamental reason/problem is that the designers of the CLS specification (which defines how languages interact with .net) did not define a means by which class members could specify that they must be called directly, rather than via callvirt, without the caller performing a null-reference check; nor did it provide a meany of defining structures which would not be subject to "normal" boxing.

Had the CLS specification defined such a means, then it would be possible for .net to consistently follow the lead established by the Common Object Model (COM), under which a null string reference was considered semantically equivalent to an empty string, and for other user-defined immutable class types which are supposed to have value semantics to likewise define default values. Essentially, what would happen would be for each member of String, e.g. Length to be written as something like [InvokableOnNull()] int String Length { get { if (this==null) return 0; else return _Length;} }. This approach would have offered very nice semantics for things which should behave like values, but because of implementation issues need to be stored on the heap. The biggest difficulty with this approach is that the semantics of conversion between such types and Object could get a little murky.

An alternative approach would have been to allow the definition of special structure types which did not inherit from Object but instead had custom boxing and unboxing operations (which would convert to/from some other class type). Under such an approach, there would be a class type NullableString which behaves as string does now, and a custom-boxed struct type String, which would hold a single private field Value of type String. Attempting to convert a String to NullableString or Object would return Value if non-null, or String.Empty if null. Attempting to cast to String, a non-null reference to a NullableString instance would store the reference in Value (perhaps storing null if the length was zero); casting any other reference would throw an exception.

Even though strings have to be stored on the heap, there is conceptually no reason why they shouldn't behave like value types that have a non-null default value. Having them be stored as a "normal" structure which held a reference would have been efficient for code that used them as type "string", but would have added an extra layer of indirection and inefficiency when casting to "object". While I don't foresee .net adding either of the above features at this late date, perhaps designers of future frameworks might consider including them.

6
votes

Why the designers of C# chose to use null as the default value of strings?

Because strings are reference types, reference types are default value is null. Variables of reference types store references to the actual data.

Let's use default keyword for this case;

string str = default(string); 

str is a string, so it is a reference type, so default value is null.

int str = (default)(int);

str is an int, so it is a value type, so default value is zero.

5
votes

Because a string variable is a reference, not an instance.

Initializing it to Empty by default would have been possible but it would have introduced a lot of inconsistencies all over the board.

4
votes

If the default value of string were the empty string, I would not have to test

Wrong! Changing the default value doesn't change the fact that it's a reference type and someone can still explicitly set the reference to be null.

Additionally Nullable<String> would make sense.

True point. It would make more sense to not allow null for any reference types, instead requiring Nullable<TheRefType> for that feature.

So why did the designers of C# choose to use null as the default value of strings?

Consistency with other reference types. Now, why allow null in reference types at all? Probably so that it feels like C, even though this is a questionable design decision in a language that also provides Nullable.

4
votes

Perhaps if you'd use ?? operator when assigning your string variable, it might help you.

string str = SomeMethodThatReturnsaString() ?? "";
// if SomeMethodThatReturnsaString() returns a null value, "" is assigned to str.
2
votes

A String is an immutable object which means when given a value, the old value doesn't get wiped out of memory, but remains in the old location, and the new value is put in a new location. So if the default value of String a was String.Empty, it would waste the String.Empty block in memory when it was given its first value.

Although it seems minuscule, it could turn into a problem when initializing a large array of strings with default values of String.Empty. Of course, you could always use the mutable StringBuilder class if this was going to be a problem.

2
votes

Since string is a reference type and the default value for reference type is null.

1
votes

Since you mentioned ToUpper(), and this usage is how I found this thread, I will share this shortcut (string ?? "").ToUpper():

    private string _city;
    public string City
    {
        get
        {
            return (this._city ?? "").ToUpper();
        }
        set
        {
            this._city = value;
        }
    }

Seems better than:

        if(null != this._city)
        { this._city = this._city.ToUpper(); }
0
votes

Maybe the string keyword confused you, as it looks exactly like any other value type declaration, but it is actually an alias to System.String as explained in this question.
Also the dark blue color in Visual Studio and the lowercase first letter may mislead into thinking it is a struct.

0
votes

Nullable types did not come in until 2.0.

If nullable types had been made in the beginning of the language then string would have been non-nullable and string? would have been nullable. But they could not do this du to backward compatibility.

A lot of people talk about ref-type or not ref type, but string is an out of the ordinary class and solutions would have been found to make it possible.