First of all, there are really a number of different new
and delete
operators (an arbitrary number, really).
First, there are ::operator new
, ::operator new[]
, ::operator delete
and ::operator delete[]
. Second, for any class X
, there are X::operator new
, X::operator new[]
, X::operator delete
and X::operator delete[]
.
Between these, it's much more common to overload the class-specific operators than the global operators -- it's fairly common for the memory usage of a particular class to follow a specific enough pattern that you can write operators that provide substantial improvements over the defaults. It's generally much more difficult to predict memory usage nearly that accurately or specifically on a global basis.
It's probably also worth mentioning that although operator new
and operator new[]
are separate from each other (likewise for any X::operator new
and X::operator new[]
), there is no difference between the requirements for the two. One will be invoked to allocate a single object, and the other to allocate an array of objects, but each still just receives an amount of memory that's needed, and needs to return the address of a block of memory (at least) that large.
Speaking of requirements, it's probably worthwhile to review the other requirements1: the global operators must be truly global -- you may not put one inside a namespace or make one static in a particular translation unit. In other words, there are only two levels at which overloads can take place: a class-specific overload or a global overload. In-between points such as "all the classes in namespace X" or "all allocations in translation unit Y" are not allowed. The class-specific operators are required to be static
-- but you're not actually required to declare them as static -- they will be static whether you explicitly declare them static
or not. Officially, the global operators much return memory aligned so that it can be used for an object of any type. Unofficially, there's a little wiggle-room in one regard: if you get a request for a small block (e.g., 2 bytes) you only really need to provide memory aligned for an object up to that size, since attempting to store anything larger there would lead to undefined behavior anyway.
Having covered those preliminaries, let's get back to the original question about why you'd want to overload these operators. First, I should point out that the reasons for overloading the global operators tend to be substantially different from the reasons for overloading the class-specific operators.
Since it's more common, I'll talk about the class-specific operators first. The primary reason for class-specific memory management is performance. This commonly comes in either (or both) of two forms: either improving speed, or reducing fragmentation. Speed is improved by the fact that the memory manager will only deal with blocks of a particular size, so it can return the address of any free block rather than spending any time checking whether a block is large enough, splitting a block in two if it's too large, etc. Fragmentation is reduced in (mostly) the same way -- for example, pre-allocating a block large enough for N objects gives exactly the space necessary for N objects; allocating one object's worth of memory will allocate exactly the space for one object, and not a single byte more.
There's a much greater variety of reasons for overloading the global memory management operators. Many of these are oriented toward debugging or instrumentation, such as tracking the total memory needed by an application (e.g., in preparation for porting to an embedded system), or debugging memory problems by showing mismatches between allocating and freeing memory. Another common strategy is to allocate extra memory before and after the boundaries of each requested block, and writing unique patterns into those areas. At the end of execution (and possibly other times as well), those areas are examined to see if code has written outside the allocated boundaries. Yet another is to attempt to improve ease of use by automating at least some aspects of memory allocation or deletion, such as with an automated garbage collector.
A non-default global allocator can be used to improve performance as well. A typical case would be replacing a default allocator that was just slow in general (e.g., at least some versions of MS VC++ around 4.x would call the system HeapAlloc
and HeapFree
functions for every allocation/deletion operation). Another possibility I've seen in practice was occurred on Intel processors when using the SSE operations. These operate on 128-bit data. While the operations will work regardless of alignment, speed is improved when the data is aligned to 128-bit boundaries. Some compilers (e.g., MS VC++ again2) haven't necessarily enforced alignment to that larger boundary, so even though code using the default allocator would work, replacing the allocating could provide a substantial speed improvement for those operations.
- Most of the requirements are covered in §3.7.3 and §18.4 of the C++ standard (or §3.7.4 and §18.6 in C++0x, at least as of N3291).
- I feel obliged to point out that I don't intend to pick on Microsoft's compiler -- I doubt it has an unusual number of such problems, but I happen to use it a lot, so I tend to be quite aware of its problems.