43
votes

This is not a matter of recommended practise (nor undefined behavior), but about what the c++-standard actually guarantees in the matter of turning all bytes of an integer type to the value of (unsigned char)0.


The Question(s)

In the snippet below, is the expression used by the if-statement guaranteed to be evaluated to true in ?

std::memset (
  reinterpret_cast<char*> (&a), // int a;
  (unsigned char)0,
  sizeof (int)
);

if (a == 0) {
  ...
}

By reading the quotations from the C99 and C++11 standard (further down in this post) we find that C99 explicitly guarantees that an integer type with all bits set to 0 will represent the value 0 in that type.

I cannot find this guarantee in the C++11 standard.

  • Is there no such guarantee?
  • Is the result of the previous snippet really implementation-specific?


In C99 (ISO/IEC 9899:1999)

5.2.1.2/1 Multibyte characters

A byte with all bits zero shall be interpreted as a null character independent of shift state. Such a byte shall not occur as part of any other multibyte character.

6.2.6.2/1 Integer types

The values of any padding bits are unspecified.45) A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value.

For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.



In C++11 (ISO/IEC 14882:2011)

2.3/3     Character sets     [lex.charset]

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits.

4
(Side remark) The cast of std::memset's second arg to unsigned char doesn't do anything. The value will be converted back to int prior to the call: en.cppreference.com/w/cpp/string/byte/memsetFred Foo
@larsmans it's just a way of being explicit (to the reader) of what is going on, std::memset will cast the int back to unsigned char if you read the spec.Filip Roséen - refp
I know that. IMHO (but this may be a coding style thing), the cast doesn't add anything, because, as you say, memset will repeat it, and it does not do any additional error check or anything.Fred Foo
I'm pretty sure your code example doesn't guarantee this; in particular the value of 'i' may be read before the call to memset as it's not using the same kind of pointer (and I think by type-punning rules it's then defined not to point to the same thing). Can somebody confirm this?dascandy

4 Answers

14
votes

C++ 11

I think the pertinent part are

3.9.1/1 In C++11

For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.

Along with 3.9.1/7

The representations of integral types shall define values by use of a pure binary numeration system.

C11

6.2.6.2 is very explicit

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

— the corresponding value with sign bit 0 is negated (sign and magnitude);

— the sign bit has the value −(2M) (two’s complement);

— the sign bit has the value −(2M − 1) (ones’ complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.

Summmary

I think the intend is the same for both standard.

  • char, signed char and unsigned char have all bits participating in the value

  • other integer types may have padding bits which don't participate in the value. A wrong bit pattern in them may imply a not valid value.

  • the interpretation is a pure binary representation, something whose definition is expanded in the C11 citation above.

Two things which may be not clear:

  • can -0 (for sign and magnitude and _ones' complement) be a trap value in C++

  • can one of the padding bits be a parity bit (i.e. can we modify the representation if we ensure that the padding bits aren't modified or not)

I'd be conservative and assume yes for the both.

2
votes

Nope. For example, there's nothing in the Standard banning a bias-based representation, it only mandates that it is binary.

2
votes

Yes, it's guaranteed.

Turning all bytes/bits of an integer type is guaranteed to make an instance of the type have the value of zero (0), as said by the below snippet(s) from the mentioned standard.


3.9.1/7 Fundamental types

A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system.49

49positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)

0
votes

No. I don't believe it's actually guaranteed, but it's rather vague.

I'd be very surprised if there has ever been a C++ implementation in which all-bits-zero is not a representation of 0, but I believe such an implementation could be conforming (though perverse).

Let's start by considering the C99 standard. (Yes, I know, the question is about C++; bear with me.) It says that the bits of the object representation of an unsigned integer type are divided into two groups: value bits and padding bits (there needn't be any padding bits, and most implementations don't have them). The value bits make up a pure binary representation; the padding bits do not contribute to the value. Some combinations of padding bits might generate a trap representation.

Signed types are similar, with the addition of a single sign bit. Signed types can be represented using either sign and magnitude, or two's complement, or one's complement -- but again, any padding bits do not contribute to the value, and some combinations of padding bits can generate trap representations.

This description does not exclude the possibility that, for example, an integer type wider than char might have a single padding bit that must always be 1; if it's 0, you have a trap representation. Or, perhaps more plausibly, it might have an odd parity bit.

After the C99 standard was published, the second Technical Corrigendum added the following sentence, which also appears in C11.

For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

I'll emphasize that this was added as normative text, not as a footnote, which suggests (but doesn't prove) that the committee members felt that the guarantee wasn't already implicit in the C99 standard.

(C90 was far less specific about how integer types are represented. It didn't mention padding bits, trap representations or two's-complement et al. I would argue that it gave implementations at least as much flexibility as C99.)

So starting with C99 TC2, the C language guarantees that all-bits-zero is a representation of zero for any integer type. In C99 and C90, that guarantee is not stated.

That's C. What about C++?

The 2011 C++ standard seems to provide only slightly more specificity about integer type representations as the old 1990 C standard. It does require signed types to be represented using either 2's complement, 1's complement, or signed magnitude. It also requires a "pure binary numeration system". It doesn't mention "trap representations", nor does it discuss padding bits except in the context of bit fields.

So, in both C90 and pre-TC2 C99 it was at least theoretically possible for all-bits-zero to be a trap representation for an integer type. The C++ standard's requirements for integer types are very similar to those of C90 and C99. It does require a "pure binary representation", but I would argue that that applies, as it does in C99, only to the value bits; though C++ doesn't mention padding bits, it doesn't forbid them.

Again, this is mainly of theoretical interest (thus the "language-lawyer" tag). The C committee felt free to impose the requirement that all-bits-zero must be a representation of zero because all implementations already satisfied it. The same almost certainly applies to C++.