16
votes

Sorry for opening this topic again, but thinking about this topic itself has started giving me an Undefined Behavior. Want to move into the zone of well-defined behavior.

Given

int i = 0;
int v[10];
i = ++i;     //Expr1
i = i++;     //Expr2
++ ++i;      //Expr3
i = v[i++];  //Expr4

I think of the above expressions (in that order) as

operator=(i, operator++(i))    ; //Expr1 equivalent
operator=(i, operator++(i, 0)) ; //Expr2 equivalent
operator++(operator++(i))      ; //Expr3 equivalent
operator=(i, operator[](operator++(i, 0)); //Expr4 equivalent

Now coming to behaviors here are the important quotes from C++ 0x.

$1.9/12- "Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for lvalue evaluation and fetchinga value previously assigned to an object for rvalue evaluation) and initiation of side effects."

$1.9/15- "If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined."

[ Note: Value computations and side effects associated with different argument expressions are unsequenced. —end note ]

$3.9/9- "Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), std::nullptr_t, and cv-qualified versions of these types (3.9.3) are collectively called scalar types."

  • In Expr1, the evaluation of the expression i (first argument), is unsequenced with respect to the evaluation of the expession operator++(i) (which has a side effect).

    Hence Expr1 has undefined behavior.

  • In Expr2, the evaluation of the expression i (first argument), is unsequenced with respect to the evaluation of the expession operator++(i, 0) (which has a side effect)'.

    Hence Expr2 has undefined behavior.

  • In Expr3, the evaluation of the lone argument operator++(i) is required to be complete before the outer operator++ is called.

    Hence Expr3 has well defined behavior.

  • In Expr4, the evaluation of the expression i (first argument) is unsequenced with respect to the evaluation of the operator[](operator++(i, 0) (which has a side effect).

    Hence Expr4 has undefined behavior.

Is this understanding correct?


P.S. The method of analyzing the expressions as in OP is not correct. This is because, as @Potatoswatter, notes - "clause 13.6 does not apply. See the disclaimer in 13.6/1, "These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose." They are just dummy declarations; no function-call semantics exist with respect to built-in operators."

2
+!: Good question. I would keep an eye for the answers.Arun
@Chubsdad : I agree with what @James McNellis said in his answer (which he deleted afterwards). All the 4 expressions invoke UB in C++0x [IMHO]. I think you should ask this question at csc++ (comp.std.c++). :)Prasoon Saurav
@Prasoon Saurav: Why is Expr3 having undefined behavior? I thought this should be fine. gcc/comeau/llvm(demo) also all compile without any warning.Chubsdad
Thats because the side effects associated with ++ [inner] and ++ [outer] are not sequenced relative to each other(although the value computations are sequenced). :)Prasoon Saurav
Check out this. It has been mentioned that Some more complicated cases are not diagnosed by -Wsequence-point option, and it may give an occasional false positive result,......Prasoon Saurav

2 Answers

15
votes

Native operator expressions are not equivalent to overloaded operator expressions. There is a sequence point at the binding of values to function arguments, which makes the operator++() versions well-defined. But that doesn't exist for the native-type case.

In all four cases, i changes twice within the full-expression. Since no ,, ||, or && appear in the expressions, that's instant UB.

§5/4:

Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.

Edit for C++0x (updated)

§1.9/15:

The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Note however that a value computation and a side effect are two distinct things. If ++i is equivalent to i = i+1, then + is the value computation and = is the side effect. From 1.9/12:

Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.

So although the value computations are more strongly sequenced in C++0x than C++03, the side effects are not. Two side effects in the same expression, unless otherwise sequenced, produce UB.

Value computations are ordered by their data dependencies anyway and, side effects absent, their order of evaluation is unobservable, so I'm not sure why C++0x goes to the trouble of saying anything, but that just means I need to read more of the papers by Boehm and friends wrote.

Edit #3:

Thanks Johannes for coping with my laziness to type "sequenced" into my PDF reader search bar. I was going to bed and getting up on the last two edits anyway… right ;v) .

§5.17/1 defining the assignment operators says

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

Also §5.3.2/1 on the preincrement operator says

If x is not of type bool, the expression ++x is equivalent to x+=1 [Note: see … addition (5.7) and assignment operators (5.17) …].

By this identity, ++ ++ x is shorthand for (x +=1) +=1. So, let's interpret that.

  • Evaluate the 1 on the far RHS and descend into the parens.
  • Evaluate the inner 1 and the value (prvalue) and address (glvalue) of x.
  • Now we need the value of the += subexpression.
    • We're done with the value computations for that subexpression.
    • The assignment side effect must be sequenced before the value of assignment is available!
  • Assign the new value to x, which is identical to the glvalue and prvalue result of the subexpression.
  • We're out of the woods now. The whole expression has now been reduced to x +=1.

So, then 1 and 3 are well-defined and 2 and 4 are undefined behavior, which you would expect.

The only other surprise I found by searching for "sequenced" in N3126 was 5.3.4/16, where the implementation is allowed to call operator new before evaluating constructor arguments. That's cool.

Edit #4: (Oh, what a tangled web we weave)

Johannes notes again that in i == ++i; the glvalue (a.k.a. the address) of i is ambiguously dependent on ++i. The glvalue is certainly a value of i, but I don't think 1.9/15 is intended to include it for the simple reason that the glvalue of a named object is constant, and cannot actually have dependencies.

For an informative strawman, consider

( i % 2? i : j ) = ++ i; // certainly undefined

Here, the glvalue of the LHS of = is dependent on a side-effect on the prvalue of i. The address of i is not in question; the outcome of the ?: is.

Perhaps a good counterexample is

int i = 3, &j = i;
j = ++ i;

Here j has a glvalue distinct from (but identical to) i. Is this well-defined, yet i = ++i is not? This represents a trivial transformation that a compiler could apply to any case.

1.9/15 should say

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the prvalue of the same scalar object, the behavior is undefined.

0
votes

In thinking about expressions like those mentioned, I find it useful to imagine a machine where memory has interlocks so that reading a memory location as part of a read-modify-write sequence will cause any attempted read or write, other than the concluding write of the sequence, to be stalled until the sequence completes. Such a machine would hardly be an absurd concept; indeed, such a design could simplify many multi-threaded code scenarios. On the other hand, an expression like "x=y++;" could fail on such a machine if 'x' and 'y' were references to the same variable, and the compiler's generated code did something like read-and-lock reg1=y; reg2=reg1+1; write x=reg1; write-and-unlock y=reg2. That would be a very reasonable code sequence on processors where writing a newly-computed value would impose a pipeline delay, but the write to x would lock up the processor if y were aliased to the same variable.