I want to convert my previous code in c++ to CUDA
for(int x=0 ; x < 100; x++)
{
for(int y=0 ; y < 100; y++)
{
for(int w=0 ; w < 100; w++)
{
for(int z=0 ; z < 100; z++)
{
........
}
}
}
}
these loops combine to make a new int value.
if I want to use CUDA I have to design threads hierarchy before building the kernel code.
So How can I design the hierarchy ?
depend on every loop I think it will be like this:
100*100*100*100 = 100000000 thread .
Could you help me
Thanks
My CUDA spec:
CUDA Device #0
Major revision number: 1
Minor revision number: 1
Name: GeForce G 105M
Total global memory: 536870912
Total shared memory per block: 16384
Total registers per block: 8192
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 512
Maximum dimension 1 of block: 512
Maximum dimension 2 of block: 512
Maximum dimension 3 of block: 64
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 65535
Maximum dimension 3 of grid: 1
Clock rate: 1600000
Total constant memory: 65536
Texture alignment: 256
Concurrent copy and execution: No
Number of multiprocessors: 1
Kernel execution timeout: Yes