In my project I've to copy a lot of numerical data in an std::valarray (or std::vector) from a CUDA (GPU) device (from the memory of the video-card to std::valarray).
So I need to resize these data-structures as faster as possible but when I call the member method vector::resize it initialize all elements of the array to the default value, with a loop.
// In a super simplified description resize behave like this pseudocode:
vector<T>::resize(N){
// Setup the new size
// allocate the new array
this->_internal_vector = new T[N];
// init to default
// This loop is slow !!!!
for ( i = 0; i < N ; ++i){
this->_internal_vector[i] = T();
}
}
Clearly I don't need this initialization because I've to copy data from the GPU and all old data are overwritten. And the initialization require some time; so I've a loss of performance.
For coping the data I need allocated memory; generated by the method resize().
I very dirty and wrong solution is to use the method vector::reserve(), but I lost all the features of the vector; and if I resize the data are replaced with the default value.
So, if you know, there exists a strategy for avoiding this pre-initialization to the default value (in valarray or vector).
I want a method resize that behave like this:
vector<T>::resize(N) {
// Allocate the memory.
this->_internal_vector = new T[N];
// Update the the size of the vector or valarray
// !! DO NOT initialize the new values.
}
An example of the performances:
#include <chrono>
#include <iostream>
#include <valarray>
#include <vector>
int main() {
std::vector<double> vec;
std::valarray<double> vec2;
double *vec_raw;
unsigned int N = 100000000;
std::clock_t start;
double duration;
start = std::clock();
// Dirty solution!
vec.reserve(N);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration reserve: " << duration << std::endl;
start = std::clock();
vec_raw = new double[N];
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration new: " << duration << std::endl;
start = std::clock();
for (unsigned int i = 0; i < N; ++i) {
vec_raw[i] = 0;
}
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration raw init: " << duration << std::endl;
start = std::clock();
// Dirty solution
for (unsigned int i = 0; i < vec.capacity(); ++i) {
vec[i] = 0;
}
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration vec init dirty: " << duration << std::endl;
start = std::clock();
vec2.resize(N);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration valarray resize: " << duration << std::endl;
return 0;
}
Output:
duration reserve: 1.1e-05
duration new: 1e-05
duration raw init: 0.222263
duration vec init dirty: 0.214459
duration valarray resize: 0.215735
Note: replacing the std::allocator does not work because the loop is called by the resize().
vec
is wrong! Thereserve
function only allocates memory, but the actual size is still unchanged. That means you index out of bounds and have undefined behavior. – Some programmer dudestd::fill
orstd::fill_n
instead of explicit loops. You could also usestd::memset
in both cases. – Some programmer dude