6
votes

Let's consider the following quote from the C++11 standard (the N3376 draft, to be precise):

(2.14.8.5)

If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character). The literal L is treated as a call of the form

     operator "" X (str , len )

Whereas for all the other types of user-defined literals (floating-point, integer, character) the length is never passed along even if the literal itself is passed as a string. For example:

42_zzz; // calls operator "" _zzz("42") and not operator "" _zzz("42", 2)

Why is there this distinction between string and non-string user-defined literals? Or should I say, why does the implementation pass len for UD string literals? The length, just as in case of other literals, could be deduced by null-termination. What am I missing?

2
Probably something to do with encodings/character sets. The other paragraphs before that one all have "[ Note: The sequence c1c2 ...ck can only contain characters from the basic source character set. — end note ]".Mat
@Mat: But strings with other encoding or character sets are still null-terminated, aren't they?Armen Tsirunyan
Null-termination's not enough. I guess the "basic source character set" doesn't include \0.Mat

2 Answers

8
votes

For a string literal it is reasonably conceivable that a null character is embedded in the sequence of the string, e.g., "a\0b". To allow the implementation to consume the entire string literal, even if there is an embedded null character, it needs to know the length of the literal. The other forms for user-defined literals cannot contain embedded zero characters.

6
votes

Strings are always null terminated in C/C++ but it never mean that they can't contain embedded \0 character, you may have "1234\05678" and while this string is null terminated, it contain an extra '\0` in it.