Why does the Standard define end()
as one past the end, instead of at the actual end?
7 Answers
The best argument easily is the one made by Dijkstra himself:
You want the size of the range to be a simple difference end − begin;
including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.
You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.
The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it)
, which runs end - begin
times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.
Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.
In a nutshell: the fact that we don't see the number 1
everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.
Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
^ ^
| |
begin end
Obviously begin
points to the beginning of the sequence, and end
points to the end of the same sequence. Dereferencing begin
accesses the element A
, and dereferencing end
makes no sense because there's no element right to it. Also, adding an iterator i
in the middle gives
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
^ ^ ^
| | |
begin i end
and you immediately see that the range of elements from begin
to i
contains the elements A
and B
while the range of elements from i
to end
contains the elements C
and D
. Dereferencing i
gives the element right of it, that is the first element of the second sequence.
Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:
+---+---+---+---+
| D | C | B | A |
+---+---+---+---+
^ ^ ^
| | |
rbegin ri rend
(end) (i) (begin)
I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i
(which I've named ri
) still points in between elements B
and C
. However due to reversing the sequence, now element B
is on the right to it.
Why does the Standard define end()
as one past the end, instead of at the actual end?
Because:
- It avoids special handling for empty ranges. For empty ranges,
begin()
is equal toend()
& - It makes the end criterion simple for loops that iterate over the elements: The loops simply
continue as long as
end()
is not reached.
Because then
size() == end() - begin() // For iterators for whom subtraction is valid
and you won't have to do awkward things like
// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }
and you won't accidentally write erroneous code like
bool empty() { return begin() == end() - 1; } // a typo from the first version
// of this post
// (see, it really is confusing)
bool empty() { return end() - begin() == -1; } // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators
Also: What would find()
return if end()
pointed to a valid element?
Do you really want another member called invalid()
which returns an invalid iterator?!
Two iterators is already painful enough...
Oh, and see this related post.
Also:
If the end
was before the last element, how would you insert()
at the true end?!
The iterator idiom of half-closed ranges [begin(), end())
is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.
void func(int* array, size_t size)
Converting to half-closed ranges [begin, end)
is very simple when you have that information:
int* begin;
int* end = array + size;
for (int* it = begin; it < end; ++it) { ... }
To work with fully-closed ranges, it's harder:
int* begin;
int* end = array + size - 1;
for (int* it = begin; it <= end; ++it) { ... }
Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value)
than it is to call std::find(array, array + size - 1, some_value)
.
Plus, if you work with half-closed ranges, you can use the !=
operator to check for the end condition, becuase (if your operators are defined correctly) <
implies !=
.
for (int* it = begin; it != end; ++ it) { ... }
However there's no easy way to do this with fully-closed ranges. You're stuck with <=
.
The only kind of iterator that supports <
and >
operations in C++ are random-access iterators. If you had to write a <=
operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list
, or the input iterators that operate on iostreams
) if C++ used fully-closed ranges.
With the end()
pointing one past the end, it is easy to iterate a collection with a for loop:
for (iterator it = collection.begin(); it != collection.end(); it++)
{
DoStuff(*it);
}
With end()
pointing to the last element, a loop would be more complex:
iterator it = collection.begin();
while (!collection.empty())
{
DoStuff(*it);
if (it == collection.end())
break;
it++;
}