3
votes

Consider the following example:

class Base {
public:
    int data_;
};

class Derived : public Base {
public:
    void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
};

int main() {
    Base base;
    Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

    derived->fun(); 

    return 0;
}

Function call is obviously undefined behavior according to C++ standard. But on all available machines and compilers (VC2005/2008, gcc on RH Linux and SunOS) it works as expected (prints "Hi!"). Do anyone know configuration this code can work incorrectly on? Or may be, more complicated example with the same idea (note, that Derived shouldn't carry any additional data anyway)?

Update:

From standard 5.2.9/8:

An rvalue of type “pointer to cv1 B”, where B is a class type, can be converted to an rvalue of type “pointer to cv2 D”, where D is a class derived (clause 10) from B, if a valid standard conversion from “pointer to D” to “pointer to B” exists (4.10), cv2 is the same cvqualification as, or greater cvqualification than, cv1, and B is not a virtual base class of D. The null pointer value (4.10) is converted to the null pointer value of the destination type. If the rvalue of type “pointer to cv1 B” points to a B that is actually a subobject of an object of type D, the resulting pointer points to the enclosing object of type D. Otherwise, the result of the cast is undefined.

And one more 9.3.1 (thanks @Agent_L):

If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

Thanks, Mike.

6
What do you mean by "real undefined behaviour"? Isn't the undefined behaviour as defined in the standard real enough?PlasmaHH
@PlasmaHH, obviously, anyone who wrote such code expect it to have some real behavior (print "Hi!") in this example. But "undefined behavior" in standard means that compiler can generate code with another behavior. Such situation I call "real undefined behavior" and I think it make sense.anxieux
How do you define 'incorrectly' ? If the standard states that it's undefined then surely what it's doing is by definition neither correct nor incorrect. I'm not sure where you're going with this but generally, undefined behaviour scenarios are something I'd think you'd want to avoid.Component 10
@Component10, since this is UB, anything the compiler does is "correct" insofar as compliance with the standard is concerned. UB gives the compiler / run-time environment free reign to do anything whatsoever and still be compliant. (One of the many reasons why a programmer should want to avoid UB).David Hammen
@anxieux: what if your compiler can detect this and starts nethack, is this real undefined behaviour too? why would you even want to care about whether undefined behaviour in a particular compiler, for some particular code, produces conincidently something that appears to have reliable behaviour? It still is UB.PlasmaHH

6 Answers

9
votes

The function fun() doesn't actually do anything that matters what the this pointer is, and as it isn't a virtual function, there's nothing special needed to look up the function. Basically, it's called like any normal (non-member) function, with a bad this pointer. It just doesn't crash, which is perfectly valid undefined behavior (if that's not a contradiction).

5
votes

The comments to the code are incorrect.

Derived *derived = static_cast<Derived*>(&base);
derived->fun(); // Undefined behavior!

Corrected version:

Derived *derived = static_cast<Derived*>(&base);  // Undefined behavior!
derived->fun(); // Uses result of undefined behavior

The undefined behavior starts with the static_cast. Any subsequent use of this ill-begotten pointer is also undefined behavior. Undefined behavior is a get out of jail free card for compiler vendors. Almost any response by the compiler is compliant with the standard.

There's nothing to stop the compiler from rejecting your cast. A nice compiler might well issue a fatal compilation error for that static_cast. The violation is easy to see in this case. In general it is not easy to see, so most compilers don't bother checking.

Most compilers instead take the easiest way out. In this case, the easy way out is to simply pretend that that pointer to an instance of class Base is a pointer to an instance of class Derived. Since your function Derived::fun() is rather benign, the easy way out in this case yields a rather benign result.

Just because you are getting a nice benign result does not mean everything is cool. It is still undefined behavior. The best bet is to never rely on undefined behavior.

3
votes

Run the same code infinite number of times on the same machine, maybe you will see it working incorrectly and unexpectedly if you're lucky.

The thing to understand is that undefined behavior (UB) does not mean that it will definitely not run as expected; it might run as expected, 1 time, 2 times, 10 times, even infinite number of times. UB simply means it is just not guaranteed to run as expected.

1
votes

You have to understand what your code is doing, then you can see it's doing nothing wrong. "this" is a hidden pointer, generated for you by the compiler.

class Base
{
public:
    int data_;
};

class Derived : public Base
{

};


void fun(Derived* pThis) 
{
::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

//because you're JUST getting numerical value of a pointer, it can be same as:
void fun(void* pThis) 
{
    ::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

//but hey, even this is still same:
void fun(unsigned int pThis) 
{
    ::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

Now it's obvious: this function cannot fail. You can even pass NULL, or some other, completely unrelated class. The behaviour is undefined, but there is nothing that can go wrong here.

//Edit: ok, according to Standard, the situations are not equal. ((Derived*)NULL)->fun(); is explicitly declared UB. However, this behaviour is usually defined in compiler docs about calling conventions. I should have written "For all compilers that I know, nothing can go wrong."

1
votes

For example, the compiler may optimize the code out. Consider sligthly different program:

if(some_very_complex_condition)
{
  // here is your original snippet:

  Base base;
  Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

  derived->fun(); 
}

The compiler can

(1) detect the undefined behaviour

(2) assume that the program shouldn't expose undefined behavior

Therefore (the compiler decides that) _some_very_complex_condition_ should be always false. Assuming this, the compiler may eliminate the whole code as not reachable.

[edit] A real world example how the compiler may eliminate code which "serves" UB case:

Why does integer overflow on x86 with GCC cause an infinite loop?

1
votes

The practical reason why this code often works is that anything which breaks this tends to be optimized out in release/optimized-for-performance builds. However, any compiler setting that focuses on finding errors (such as debug builds) is more likely to trip on this.

In those cases, your assumption ("note, that Derived shouldn't carry any additional data anyway") doesn't hold. It definitely should, to facilitate debugging.

A slightly more complicated example is even trickier:

class Base {
public:
    int data_;
    virtual void bar() { std::cout << "Base\n"; }
};

class Derived : public Base {
public:
    void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
    virtual void bar() { std::cout << "Derived\n"; }
};

int main() {
    Base base;
    Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

    derived->fun(); 
    derived->bar();
}

Now a reasonable compiler may decide to skip the vtable and statically call Base::bar() since that's the object you're calling bar() on. Or it may decide that derived must point to a real Derived since you called fun on it, skip the vtable, and call Derived::bar(). As you see, both optimizations are quite reasonable given the circumstances.

And in this we see why Undefined Behavior can be so surprising: compilers can make incorrect assumptions following code with UB, even if the statement itself is compiled right.