3
votes

I want to reserve a block of memory (1GB) to load data into it for analysis. Each record is about 10K bytes and there is at least 100k records. Originally I was going to use malloc in c++ code but I was advised against it.

Now, will using char * block = new char[1000000000] require additional memory to store the pointers to each of the 1,000,000,000 elements in the array? Will using char * block = malloc(1000000000 * sizeof(char)) require less additional memory to create than new[]?

My goal is to use the least amount of memory possible and don't want to be swamping records in and out of memory.

Thanks :)

5
And what do you imagine new actually does ? Would you be surprised to read that it combines a call to malloc to allocate the memory and a call to a constructor to build the object (if last part is relevant) ? - Serge Ballesta
malloc and new almost certainly use the same underlying allocator. Why were you advised against it? - Jonathan Potter
malloc() just gives you a chunk of memory. new() will actually run a bunch of code to give you an object, which will internally do malloc() anyways to give you the same space you'd get with calling malloc yourself. - Marc B
@Olaf: what's the constructor for char? - Steve Summit
Is there any reason you cannot use std::vector and std::vector::reserve or std::vector::resize? - Daniel

5 Answers

3
votes

On my Linux machine:

Malloc

//malloc.cc
#include <cstdlib>
int main() { char* block = (char*) malloc(1000000000); }

Runtime:

$ make malloc
$ valgrind ./malloc 2>&1|grep total
==23855==   total heap usage: 1 allocs, 0 frees, 1,000,000,000 bytes allocated

New

//new.cc
int main() { char* block = new char[1000000000]; }

Runtime:

$ make new
$ valgrind ./new 2>&1|grep total
  ==24460==   total heap usage: 2 allocs, 0 frees, 1,000,072,704 bytes allocated

The 72,704B overhead remains constant for different values.

2
votes

In order for operator delete[] to work correctly with non-PODs, the size of the array (a single size_t) is usually placed at the beginning of the whole block, and the first object at the first appropriately-aligned address.

For PODs, operator new[] (without an initializer) is generally the same as a malloc.

With an initializer (again, with a POD type), the results depend on the compiler: It could translate to a loop over the elements, or reduce to a memset.

Given the large amount of memory you intend to allocate, the results of malloc depend on the runtime - some implementations have a hard upper limit on the block size.

If you are targeting Windows, you can use VirtualAlloc for something this size. Likewise, use mmap on *nix.

1
votes

You asked:

Now, will using char * block = new char[1000000000] require additional memory to store the pointers to each of the 1,000,000,000 elements in the array?

Definitely not.

From the C++11 Standard (Section 5.3.4 New)

5 When the allocated object is an array (that is, the noptr-new-declarator syntax is used or the new-type-id or type-id denotes an array type), the new-expression yields a pointer to the initial element (if any) of the array.

The key piece from that is that you get back a pointer to the initial element (if any) of the array

You also asked:

Will using char * block = malloc(1000000000 * sizeof(char)) require less additional memory to create than new[]?

The standard does not specify anything about the overhead associated with using either allocation methods. In most implementations, the memory overhead associated with the two methods should be about the same if not exactly the same. I will be surprised if that is not true.

0
votes

new[N] is reserving a little bit more than asked. It stores counter [N] at the beginning (to know how much destructors it needs to call with delete[]) and returns a memory block just after it.

0
votes

If you use new to allocate an array of characters you will get an array of characters. There will not be additional pointers for each element. You just get a large contiguious area of memory similar to what you would get with malloc().

What new will do is allocate the memory and then call the constructor which in your case will do nothing of any consequence since this is just an array of Plain Old Data.

I ran a quick check using Visual Studio 2013 with a debug compile and looking at the memory allocation in Windows Task Manager as I stepped over first the new and then the malloc() and the numbers looked about the same for the memory allocation at each step.

With such a large memory area you may run into page faults as the operating system pages your large memory area in and out as various parts of the memory area are accessed. Not sure that you can really do anything about that nor am I sure that it is a big worry. Partly any swapping behavior will depend on the amount of physical memory you have along with the mix of additional services and applications and their memory usage.