Running texture objects with streams

Question

I am trying to use texture objects in streams. The elements in the buffers are allocated and all set to value 1:

float* aa[nStreams];
for (int i = 0; i < nStreams; ++i) {
        checkCuda(cudaMallocHost((void**)&aa[i], streamBytes)); // device 
         
        float* ar = aa[i]; 
        for (int k = 0; k < streamSize; k++) {
            ar[k]=1; 
        }
}

I then declare and create an array of textures in the loop over streams:

cudaTextureObject_t tex_ar[nStreams];

for (int i = 0; i < nStreams; ++i) {
    int offset = i * streamSize; 
      
    // create texture object
    cudaResourceDesc resDesc;
    memset(&resDesc, 0, sizeof(resDesc));
    resDesc.resType = cudaResourceTypeLinear;
    resDesc.res.linear.devPtr = aa[i];
    resDesc.res.linear.desc.f = cudaChannelFormatKindFloat;
    resDesc.res.linear.desc.x = 32; // bits per channel
    resDesc.res.linear.sizeInBytes = streamBytes;
        
    cudaTextureDesc texDesc;
    memset(&texDesc, 0, sizeof(texDesc));

    texDesc.readMode = cudaReadModeElementType;

    // create texture object: we only have to do this once!
    cudaCreateTextureObject(&tex_ar[i], &resDesc, &texDesc, NULL);
     
    checkCuda(cudaMemcpyAsync(&d_b[offset], &b[offset],
        streamBytes, cudaMemcpyHostToDevice,
        stream[i]));

    kernelTex << <streamSize / blockSize, blockSize, 0, stream[i] >> >(tex_ar[i], d_b, offset);

    checkCuda(cudaMemcpyAsync(&b[offset], &d_b[offset],
        streamBytes, cudaMemcpyDeviceToHost,
        stream[i]));
}

The kernel simply assign back the texture value:

__global__ void kernelTex(cudaTextureObject_t tex, float* b, int offset)
{
   int i = offset + threadIdx.x + blockIdx.x*blockDim.x; 
   b[i] = tex1Dfetch<float>(tex, i);
}

So I expect, after kernels are finished, array b to have all its elements equal to 1. However other than the elements set by the first stream the rest of the elements are zero.

Robert Crovella Robert Crovella · Accepted Answer · 2020-07-03T14:58:50

This:

b[i] = tex1Dfetch<float>(tex, i);

should be this:

b[i] = tex1Dfetch<float>(tex, i-offset);

Each of the textures you are passing is of size streamSize. So you are texturing beyond the end of the texture, for iterations after the first. Yes, it is legal to texture beyond the end of the texture. It appears that the return value is zero in this case.

Note that when the resource type is linear, the address mode is ignored. Therefore you don't have any control over the out-of-range behavior. Again, I assume in this case the out-of-range behavior is to return 0.

Running texture objects with streams

1 Answers