7
votes

When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven't checked the earlier ones and yes, I know it's incredibly unlikely that I alone among all the planet's inhabitants have found a bug in the standard). Information follows:

  1. Section 6.5.3.4 ("The sizeof operator") para 2 states "The sizeof operator yields the size (in bytes) of its operand".
  2. Para 3 of that section states: "When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1".
  3. Section 7.20.3.3 describes void *malloc(size_t sz) but all it says is "The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate". It makes no mention at all what units are used for the argument.
  4. Annex E startes the 8 is the minimum value for CHAR_BIT so chars can be more than one byte in length.

My question is simply this:

In an environment where a char is 16 bits wide, will malloc(10 * sizeof(char)) allocate 10 chars (20 bytes) or 10 bytes? Point 1 above seems to indicate the former, point 2 indicates the latter.

Anyone with more C-standard-fu than me have an answer for this?

3

3 Answers

16
votes

In a 16-bit char environment malloc(10 * sizeof(char)) will allocate 10 chars (10 bytes), because if char is 16 bits, then that architecture/implementation defines a byte as 16 bits. A char isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.

The relevant section from the C standard follows:

3.6 Terms, definitions and symbols

byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment...

NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

2
votes

In the C99 standard the rigorous correlation between bytes, char, and object size is given in 6.2.6.1/4 "Representations of types - General":

Values stored in non-bit-field objects of any other object type consist of n Ă— CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

In the C++ standard the same relationship is given in 3.9/2 "Types":

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

In C90 there doesn't appear to be as explicitly mentioned correlation, but between the definition of a byte, the definition of a character, and the definition of the sizeof operator the inference can be made that a char type is equivalent to a byte.

Also note that the number of bits in a byte (and the number of bits in a char) is implementation defined—strictly speaking it doesn't need to be 8 bits. And onebyone points out in a comment elsewhere that DSPs commonly have bytes with a number of bits that isn't 8.

Note that IETF RFCs and standards generally (always?) use the term 'octect' instead of 'byte' to be unambiguous that the units they're talking about have exactly 8 bits - no more, no less.

1
votes

Aren't the units of "size_t sz" in whatever the addressable unit of your architecture is? I work with a DSP whose addresses correspond to 32-bit values, not bytes. malloc(1) gets me a pointer to a 4-byte area.