I have a testing code that needs to update keys inside a device_vector of a class. Therefore, how do I divide portions of the work to especific threads?
Example of the code without the division:
__global__ void UpdateKeys(Request* vector, int size, int seed, int qt_threads){
curandState_t state;
curand_init(seed, threadIdx.x, 0, &state);
int id = blockIdx.x * blockDim.x + threadIdx.x;
if(id < size){
vector[i].key_ = (curand(&state % 100) / 100;
}
}
That vector is passed as a thrust::device_vector.
Examples of what I want:
1000 keys and 2000 threads: use only 1000 and give a key to each one.
1000 keys and 1000 threads: use it all.
1 key and 100 threads: use 1 thread.
500 keys and 250 threads: each thread take care of 2.
240 keys and 80 threads: each thread take care of 3.
how do I divide ...?do you ask because of performance concerns? - m.s.