2
votes

This question might make no sense, but I'll ask anyway with an example. Does this code exhibit undefined behaviour?

int main() {
    int a, b; // uninitialised

    memcpy(&a, &b, sizeof(int));
}

I would usually say yes, because causing an lvalue-to-rvalue conversion of an uninitialised object is UB, something which must be done to copy the bytes of b to a.

However, memcpy may or may not be implemented in C++. If memcpy is written in assembly for example, then there are no such rules. Do programs that do things that would normally cause undefined behaviour still cause it if they outsource the offending operations to other languages with dissimilar rules?

5
P.S: If someone knows whether and why &b is implied to be UB by 4.1/1, then he may want to answer to this question too.Andy Prowl
@AndyProwl it's not (and that's a different question; this isn't a dangling reference).Seth Carnegie
I thought you were implying that by saying "causing an lvalue-to-rvalue conversion of an uninitialised object is UB"Andy Prowl
@AndyProwl taking the address of an object doesn't convert it to an rvalue, right(?)Seth Carnegie
@aschepler I'm talking strictly about undefined behaviour, not "desired behaviour". And if using memcpy thus is undefined just because of a requirement of the standard, then I'll just pick another function that I wrote or something.Seth Carnegie

5 Answers

2
votes

This is sort of like asking, "In C++, does excessively charring a steak cause Undefined Behavior?"

All Undefined Behavior means is that the C++ (or C, etc.) Standard does not guarantee what will happen when translating and/or executing a program. Not surprisingly, the C++ Standard doesn't say much at all about functions from other languages.

The only somewhat relevant quotes are from 7.4

The asm declaration is conditionally-supported; its meaning is implementation-defined.

and from 7.5

[In a linkage-specification syntax...] This International Standard specifies the semantics for the string-literals "C" and "C++". Use of a string-literal other than "C" or "C++" is conditionally-supported, with implementation-defined semantics.

So basically, it might be possible to use other languages along with C++, but this document isn't going to talk about that other than the syntax necessary to glue the pieces together.

From the point of view of the C++ Standard, functions from other languages have implementation-defined effects on C++ programs. Implementation-Defined is usually considered better than Undefined Behavior, though unportable. But it shouldn't be a surprise that using something other than C++ and C is not necessarily portable to every C++ implementation.

2
votes

In the case you describe a is allocated by the C++ program and passed to memcpy(). That means that the behavior is still undefined.

However, this does not mean that the behavior will be random or change between runs. Undefined means that the behavior is not defined. That means that you cannot rely on any particular behavior, including the program breaking. Some of the hardest problems to debug in C or C++ is when the compiler translates an undefined construct into something which works as expected. Then suddenly things stop working when you change a compiler flag.

1
votes

An implementation is free to extend the language and gives a defined behavior to a C undefined behavior. In the example of reading an uninitialized object with automatic storage duration (UB in C) it can say its value is unspecified but that evaluating the object does not invoke undefined behavior.

In the C89 and C99 Rationales, C committee says that:

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.*

Such a program will be a valid program for this implementation but will still be an invalid C program.

1
votes

UB is not a language specific "thing". It is "what you are doing doesn't have a defined behaviour". So the reason that "using uninitialized memory is undefined" is that the language C can not stipulate what should happen if you read memory that hasn't been written to. As discussed in another answer, if the memory is not initialized, it may have parity or ECC errors, because the parity bit is set correctly on the first write. When memory contains "whatever it came out of power-up as", it may well have the wrong parity/ecc value.

ECC or parity errors often lead to the system stopping, because it's not expected to have bad memory!

So, it's not what language the code you execute is written in that matters, it is "the behaviour if you read from memory that hasn't been initialized could go wrong". The act of reading the memory, whether the code is written in C, C++, assembler, Pascal, Fortran or LisP.

And bear in mind that undefined isn't necessarily that "Bad things happen", just that "the specification does not explain what the result is, and bad things are allowed to happen in UB". Dividing by zero is not guaranteed to crash your program - it most likely will, but it may also just give you back the same value as you fed in on the other side of the / - that would be perfectly valid UB. Reading uninitialized memory can result in "you get zero", or "you get all ones", or "you get some mixture of ones and zeros, nobody knows which ones", or "could lead to the system rebooting due to suspected memory error". And of course, it may not be the same every time either - sometimes the parity bits are "right", sometimes not, for example.

To clarify: I'm not one of those people that know every paragraph of every section of the C or C++ standards. I write code for a living, and I know enough of how processors and connected hardware to understand WHY the specifications say "it is undefined behaviour when you ..." [it probably doesn't use those words at all, since standards don't use second person] - in the case of using variables that haven't been initialized, the C language doesn't try to enforce any particular behaviour, because it MAY restrict the language from being used on a particular platform, because the platform can't guarantee that behaviour [and if you specify a behaviour, someone will rely on that behaviour sooner or later, making it a necessary part to implement on every platform].

0
votes

This refers to http://blog.regehr.org/archives/213 which I suggest. If you want to argue about the following defintions, please take it up over there. IMO, this is the heart of what you are asking and most answers are trying to convey. You are free to disagree.

Let's consider #1 and #2 of these:

Type 1: Behavior is defined for all inputs 
Type 2: Behavior is defined for some inputs and undefined for others 
Type 3: Behavior is undefined for all inputs 

Type 1 means the function raises errors and exceptions to prevent going into UB. Type 2 means the code does not raise all possible error and exceptions and may possibly go into UB.

The point is: you are using a documented library call. Let's use the example of strcpy or gets instead. It seems to be common knowledge that you can feed these library calls long strings and overflow memory, causing UB. There are implementation-defined constraints that may stop this but you know better. So, your question really has little to wrt the standard but everything about what the documentation for the library call says, and whether it is a Type 1 or Type 2. strcpy, gets, and other library functions may clearly be Type 2. So it is a caveat to the users of the library. I am Type 2, I may explode when provoked.

IMO, there is no way a standard can deal with library add-ons, because they are implementation-defined. So the answer is: it has nothing to do with C++ standards.