CUDA/OpenGL Interop: Writing to surface object does not erase previous contents

Question

I am attempting to use a CUDA kernel to modify an OpenGL texture, but am having a strange issue where my calls to surf2Dwrite() seem to blend with the previous contents of the texture, as you can see in the image below. The wooden texture in the back is what's in the texture before modifying it with my CUDA kernel. The expected output would include ONLY the color gradients, not the wood texture behind it. I don't understand why this blending is happening.

Possible Problems / Misunderstandings

I'm new to both CUDA and OpenGL. Here I'll try to explain the thought process that led me to this code:

I'm using a cudaArray to access the texture (rather than e.g. an array of floats) because I read that it's better for cache locality when reading/writing a texture.
I'm using surfaces because I read somewhere that it's the only way to modify a cudaArray
I wanted to use surface objects, which I understand to be the newer way of doing things. The old way is to use surface references.

Some possible problems with my code that I don't know how to check/test:

Am I being inconsistent with image formats? Maybe I didn't specify the correct number of bits/channel somewhere? Maybe I should use floats instead of unsigned chars?

Code Summary

You can find a full minimum working example in this GitHub Gist. It's quite long because of all the moving parts, but I'll try to summarize. I welcome suggestions on how to shorten the MWE. The overall structure is as follows:

create an OpenGL texture from a file stored locally
register the texture with CUDA using cudaGraphicsGLRegisterImage()
call cudaGraphicsSubResourceGetMappedArray() to get a cudaArray that represents the texture
create a cudaSurfaceObject_t that I can use to write to the cudaArray
pass the surface object to a kernel that writes to the texture with surf2Dwrite()
use the texture to draw a rectangle on-screen

OpenGL Texture Creation

I am new to OpenGL, so I'm using the "Textures" section of the LearnOpenGL tutorials as a starting point. Here's how I set up the texture (using the image library stb_image.h)

GLuint initTexturesGL(){
    // load texture from file
    int numChannels;
    unsigned char *data = stbi_load("img/container.jpg", &g_imageWidth, &g_imageHeight, &numChannels, 4);
    if(!data){
        std::cerr << "Error:  Failed to load texture image!" << std::endl;
        exit(1);
    }

    // opengl texture
    GLuint textureId;
    glGenTextures(1, &textureId);
    glBindTexture(GL_TEXTURE_2D, textureId);

    // wrapping
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_MIRRORED_REPEAT);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_MIRRORED_REPEAT);

    // filtering
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);

    // set texture image
    glTexImage2D(
        GL_TEXTURE_2D,    // target
        0,                // mipmap level
        GL_RGBA8,         // internal format (#channels, #bits/channel, ...)
        g_imageWidth,     // width
        g_imageHeight,    // height
        0,                // border (must be zero)
        GL_RGBA,          // format of input image
        GL_UNSIGNED_BYTE, // type
        data              // data
    );
    glGenerateMipmap(GL_TEXTURE_2D);

    // unbind and free image
    glBindTexture(GL_TEXTURE_2D, 0);
    stbi_image_free(data);

    return textureId;
}

CUDA Graphics Interop

After calling the function above, I register the texture with CUDA:

void initTexturesCuda(GLuint textureId){
    // register texture
    HANDLE(cudaGraphicsGLRegisterImage(
        &g_textureResource,                       // resource
        textureId,                                // image
        GL_TEXTURE_2D,                            // target
        cudaGraphicsRegisterFlagsSurfaceLoadStore // flags
    ));

    // resource description for surface
    memset(&g_resourceDesc, 0, sizeof(g_resourceDesc));
    g_resourceDesc.resType = cudaResourceTypeArray;
}

Render Loop

Every frame, I run the following to modify the texture and render the image:

while(!glfwWindowShouldClose(window)){
        // -- CUDA --

        // map
        HANDLE(cudaGraphicsMapResources(1, &g_textureResource));


        HANDLE(cudaGraphicsSubResourceGetMappedArray(
            &g_textureArray,   // array through which to access subresource
            g_textureResource, // mapped resource to access
            0,                 // array index
            0                  // mipLevel
        ));

        // create surface object (compute >= 3.0)
        g_resourceDesc.res.array.array = g_textureArray;
        HANDLE(cudaCreateSurfaceObject(&g_surfaceObj, &g_resourceDesc));

        // run kernel
        kernel<<<gridDim, blockDim>>>(g_surfaceObj, g_imageWidth, g_imageHeight);

        // unmap
        HANDLE(cudaGraphicsUnmapResources(1, &g_textureResource));

        // --- OpenGL ---

        // clear
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

        // use program
        shader.use();

        // triangle
        glBindVertexArray(vao);
        glBindTexture(GL_TEXTURE_2D, textureId);
        glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
        glBindVertexArray(0);

        // glfw:  swap buffers and poll i/o events
        glfwSwapBuffers(window);
        glfwPollEvents();
    }

CUDA Kernel

The actual CUDA kernel is as follows:

__global__ void kernel(cudaSurfaceObject_t surface, int nx, int ny){
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    if(x < nx && y < ny){
        uchar4 data = make_uchar4(x % 255, 
                                  y % 255, 
                                  0, 255);
        surf2Dwrite(data, surface, x * sizeof(uchar4), y);
    }
}

Sidenote: glTexParameteri applies parameters to the currently bound image, so glGenTextures and glBindTexture has to be done before. — Rabbid76
Thanks for pointing this out, I updated my post/Gist accordingly. — Benjamin Bray

Michael Kenzel Michael Kenzel · Accepted Answer · 2018-12-23T14:01:55

If I understand correctly, you initially register the texture, map it once, create a surface object for the array representing the mapped texture, and then unmap the texture. Every frame, you then map the resource again, ask for the array representing the mapped texture, and then completely ignore that one and use the surface object created for the array you got back when you first mapped the resource. From the documentation:

[…] The value set in array may change every time that resource is mapped.

You have to create a new surface object every time you map the resource because you might get a different array every time. And, in my experience, you will actually get a different one every so often. It may be a valid thing to do to only create a new surface object whenever the array actually changes. The documentation seems to allow for that, but I never tried, so I can't tell whether that works for sure…

Apart from that: You generate mipmaps for your texture. You only overwrite mip level 0. You then render the texture using mipmapping with trilinear interpolation. So my guess would be that you just happen to render the texture at a resolution that does not match the resolution of mip level 0 exactly and, thus, you will end up interpolating between level 0 (in which you wrote) and level 1 (which was generated from the original texture)…