75
votes

In C++, the concept of returning reference from the copy assignment operator is unclear to me. Why can't the copy assignment operator return a copy of the new object? In addition, if I have class A, and the following:

A a1(param);
A a2 = a1;
A a3;

a3 = a2; //<--- this is the problematic line

The operator= is defined as follows:

A A::operator=(const A& a)
{
    if (this == &a)
    {
        return *this;
    }
    param = a.param;
    return *this;
}
7
There's no such requirement. But if you want to stick to the principle of least surprize you'll return A& just like a=b is an lvalue expression referring to a in case a and b are ints.sellibitze
@MattMcNabb Thank you for letting me know! Will do thatbks
Why cant we return A* from the copy assignment operator I guess the chaining assignment would still work properly. Can anyone help understand the perils of returning A* if there are any.Krishna Oza
Note: Since C++11 there is also the move-assignment operator, all the same logic in this Q & A also applies to the move-assignment operator. In fact they could both be the same function if declared as A & operator=(A a);, i.e. taking the argument by value.M.M
@Krishna_Oza The real question is why you want to return a pointer. Think about how ugly and ambiguous code for operator overloading and returning would be if we only had pointers - in key cases, fatally ambiguous (also: fatally ugly). And then just read the language's creator's own words about all this: stackoverflow.com/questions/8007832/…underscore_d

7 Answers

73
votes

Strictly speaking, the result of a copy assignment operator doesn't need to return a reference, though to mimic the default behavior the C++ compiler uses, it should return a non-const reference to the object that is assigned to (an implicitly generated copy assignment operator will return a non-const reference - C++03: 12.8/10). I've seen a fair bit of code that returns void from copy assignment overloads, and I can't recall when that caused a serious problem. Returning void will prevent users from 'assignment chaining' (a = b = c;), and will prevent using the result of an assignment in a test expression, for example. While that kind of code is by no means unheard of, I also don't think it's particularly common - especially for non-primitive types (unless the interface for a class intends for these kinds of tests, such as for iostreams).

I'm not recommending that you do this, just pointing out that it's permitted and that it doesn't seem to cause a whole lot of problems.

These other SO questions are related (probably not quite dupes) that have information/opinions that might be of interest to you.

64
votes

A bit of clarification as to why it's preferable to return by reference for operator= versus return by value --- as the chain a = b = c will work fine if a value is returned.

If you return a reference, minimal work is done. The values from one object are copied to another object.

However, if you return by value for operator=, you will call a constructor AND destructor EACH time that the assignment operator is called!!

So, given:

A& operator=(const A& rhs) { /* ... */ };

Then,

a = b = c; // calls assignment operator above twice. Nice and simple.

But,

A operator=(const A& rhs) { /* ... */ };

a = b = c; // calls assignment operator twice, calls copy constructor twice, calls destructor type to delete the temporary values! Very wasteful and nothing gained!

In sum, there is nothing gained by returning by value, but a lot to lose.

(Note: This isn't meant to address the advantages of having the assignment operator return an lvalue. Read the other posts for why that might be preferable)

10
votes

When you overload operator=, you can write it to return whatever type you want. If you want to badly enough, you can overload X::operator= to return (for example) an instance of some completely different class Y or Z. This is generally highly inadvisable though.

In particular, you usually want to support chaining of operator= just like C does. For example:

int x, y, z;

x = y = z = 0;

That being the case, you usually want to return an lvalue or rvalue of the type being assigned to. That only leaves the question of whether to return a reference to X, a const reference to X, or an X (by value).

Returning a const reference to X is generally a poor idea. In particular, a const reference is allowed to bind to a temporary object. The lifetime of the temporary is extended to the lifetime of the reference to which it's bound--but not recursively to the lifetime of whatever that might be assigned to. This makes it easy to return a dangling reference--the const reference binds to a temporary object. That object's lifetime is extended to the lifetime of the reference (which ends at the end of the function). By the time the function returns, the lifetime of the reference and temporary have ended, so what's assigned is a dangling reference.

Of course, returning a non-const reference doesn't provide complete protection against this, but at least makes you work a little harder at it. You can still (for example) define some local, and return a reference to it (but most compilers can and will warn about this too).

Returning a value instead of a reference has both theoretical and practical problems. On the theoretical side, you have a basic disconnect between = normally means and what it means in this case. In particular, where assignment normally means "take this existing source and assign its value to this existing destination", it starts to mean something more like "take this existing source, create a copy of it, and assign that value to this existing destination."

From a practical viewpoint, especially before rvalue references were invented, that could have a significant impact on performance--creating an entire new object in the course of copying A to B was unexpected and often quite slow. If, for example, I had a small vector, and assigned it to a larger vector, I'd expect that to take, at most, time to copy elements of the small vector plus a (little) fixed overhead to adjust the size of the destination vector. If that instead involved two copies, one from source to temp, another from temp to destination, and (worse) a dynamic allocation for the temporary vector, my expectation about the complexity of the operation would be entirely destroyed. For a small vector, the time for the dynamic allocation could easily be many times higher than the time to copy the elements.

The only other option (added in C++11) would be to return an rvalue reference. This could easily lead to unexpected results--a chained assignment like a=b=c; could destroy the contents of b and/or c, which would be quite unexpected.

That leaves returning a normal reference (not a reference to const, nor an rvalue reference) as the only option that (reasonably) dependably produces what most people normally want.

5
votes

It's partly because returning a reference to self is faster than returning by value, but in addition, it's to allow the original semantics that exist in primitive types.

4
votes

operator= can be defined to return whatever you want. You need to be more specific as to what the problem actually is; I suspect that you have the copy constructor use operator= internally and that causes a stack overflow, as the copy constructor calls operator= which must use the copy constructor to return A by value ad infinitum.

3
votes

There is no core language requirement on the result type of a user-defined operator=, but the standard library does have such a requirement:

C++98 §23.1/3:

The type of objects stored in these components must meet the requirements of CopyConstructible types (20.1.3), and the additional requirements of Assignable types.

C++98 §23.1/4:

In Table 64, T is the type used to instantiate the container, t is a value of T, and u is a value of (possibly const) T.

enter image description here


Returning a copy by value would still support assignment chaining like a = b = c = 42;, because the assignment operator is right-associative, i.e. this is parsed as a = (b = (c = 42));. But returning a copy would prohibit meaningless constructions like (a = b) = 666;. For a small class returning a copy could conceivably be most efficient, while for a larger class returning by reference will generally be most efficient (and a copy, prohibitively inefficient).

Until I learned about the standard library requirement I used to let operator= return void, for efficiency and to avoid the absurdity of supporting side-effect based bad code.


With C++11 there is additionally the requirement of T& result type for default-ing the assignment operator, because

C++11 §8.4.2/1:

A function that is explicitly defaulted shall […] have the same declared function type (except for possibly differing ref-qualifiers and except that in the case of a copy constructor or copy assignment operator, the parameter type may be “reference to non-const T”, where T is the name of the member function’s class) as if it had been implicitly declared

0
votes

I guess, because user defined object should behave like builtin types. For example:

char c;
while ((c = getchar()) != -1 ) {/* do the stuff */}