4
votes

I have seen and used C++ code like the following:

int myFourcc = 'ABCD';

It works in recent versions of GCC, not sure how recent. Is this feature in the standard? What is it called?

I have had trouble searching the web for it...

EDIT:

I found this info as well, for future observers:

from gcc documentation

The compiler values a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.

For example, 'ab' for a target with an 8-bit char would be interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')', and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')'.

5
I've seen this question before on SO... Too busy to look it up.Bill K
possible duplicate of C++ multicharacter literalphuclv

5 Answers

7
votes

See section 6.4.4.4, paragraph 10 of the C99 standard:

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Recall that implementation-defined means that the implementation (in this case, the C compiler) can do whatever it wants, but it must be documented.

Most compilers will convert it to an integral constant corresponding to the concatenation of the octets corresponding to the individual characters, but the endianness could be either little- or big-endian, depending on the endianness of the target architecture.

Therefore, portable code should not use multi-character constants and should instead use plain integral constants. Instead of 'abcd', which could be of either endianness, use either 0x61626364 or 0x64636261, which have known endiannesses (big and little respectively).

6
votes

"Note that according to the C standard there is no limit on the length of a character constant, but the value of a character constant that contains more than one character is implementation-defined. Recent versions of GCC provide support multi-byte character constants, and instead of an error the warnings multiple-character character constant or warning: character constant too long for its type are generated in this case."

5
votes

C++ standard draft says:

A character literal is one or more characters enclosed in single quotes, as in 'x'

and

An ordinary character literal that contains more than one c-char is a multicharacter literal. A multichar- acter literal has type int and implementation-defined value.

1
votes

Yes, it is standard, but implementation-defined.

In practical experience, it represents the 32-bit integer you get by concatenating bytes 'A', 'B', 'C' and 'D'.

0
votes

If anyone is interested the specific example given is the ID of a data storage format.
It's very useful to be able to get a human readable value of a constant eg 'XVID' rather than just 1234. It's worth thinking about when you are making up arbitrary integer keys.