I want to render several images to a Texture2DArray, and access the resulting images in a compute shader. However, the Load
method in the compute shader returns [0.0, 0.0, 0.0, 0.0]
rather than the image data.
Currently, I am doing the following:
1. Setup
I first prepare a set of "render slots" by creating a Texture2DArray using CreateTexture2D
with texArrayDesc.ArraySize = numRenderSlots
:
texArrayDesc.Width = width;
texArrayDesc.Height = height;
texArrayDesc.MipLevels = 1;
texArrayDesc.ArraySize = numRenderSlots;
texArrayDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
texArrayDesc.Usage = D3D11_USAGE_DEFAULT;
texArrayDesc.BindFlags = D3D11_BIND_RENDER_TARGET | D3D11_BIND_SHADER_RESOURCE;
texArrayDesc.CPUAccessFlags = 0;
m_device->CreateTexture2D(&texArrayDesc, NULL, &m_renderSlotsTexArray);
I then create a ShaderResourceView for this texture array so I can access it from a compute shader:
srvDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2DARRAY;
srvDesc.Texture2DArray.FirstArraySlice = 0;
srvDesc.Texture2DArray.MostDetailedMip = 0;
srvDesc.Texture2DArray.MipLevels = 1;
srvDesc.Texture2DArray.ArraySize = numRenderSlots;
m_device->CreateShaderResourceView(m_renderSlotsTexArray, &srvDesc, &m_renderSlotsSrv);
For each slice in the Texture2DArray, I create a RenderTargetView so I can render to it.
rtvDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
rtvDesc.ViewDimension = D3D11_RTV_DIMENSION_TEXTURE2DARRAY;
rtvDesc.Texture2DArray.MipSlice = 0;
rtvDesc.Texture2DArray.ArraySize = 1;
for (int i = 0; i < numRenderSlots; i++) {
// Change slot RTV desc to choose correct slice from array
rtvDesc.Texture2DArray.FirstArraySlice = D3D11CalcSubresource(0, i, 1);
// Create the RTV for the slot in m_renderSlotsTexArray
ID3D11RenderTargetView* slotRtv;
m_device->CreateRenderTargetView(m_renderSlotsTexArray, &rtvDesc, &slotRtv);
// Add the RenderTargetView to a list
m_slotRtvs.push_back(slotRtv);
}
2. Usage
I then render a set of different images by setting the associated render slot as a render target. Note that the background color is not [0,0,0,0]
. Code simplified:
for (int i = 0; i < numRenderSlots; i++) {
setupScene(i);
deviceContext->OMSetRenderTargets(1, &m_slotRtvs[i], depthStencilView);
deviceContext->ClearRenderTargetView(m_slotRtvs[slotIdx], {0.2,0.2,0.2,0.0});
deviceContext->ClearDepthStencilView(depthStencilView, D3D11_CLEAR_DEPTH, 1.0f, 0);
render();
}
I then setup a compute shader with a StructuredBuffer for shader output with associated UnorderedAccessView. I pass this into the shader, along with the SRV of the render slots Texture2DArray. I finally dispatch the shader, to operate over the image in 32x32 sized chunks.
deviceContext->CSSetShader(m_computeShader, NULL, 0);
deviceContext->CSSetShaderResources(0, 1, &renderSlotsSrv);
deviceContext->CSSetUnorderedAccessViews(0, 1, &outputUav, nullptr);
deviceContext->Dispatch(img_width / 32, img_height / 32, numParams);
I then try to access the rendered image data in the compute shader. However, imgs.Load
always seems to return [0.0, 0.0, 0.0, 0.0]
.
Texture2DArray<float4> imgs : register(t0);
RWStructuredBuffer<float3> output : register(u0);
[numthreads(32,32,1)]
void ComputeShaderMain(
uint3 gID : SV_GroupID,
uint3 dtID : SV_DispatchThreadID,
uint3 tID : SV_GroupThreadID,
uint gi : SV_GroupIndex )
{
int idx_x = (gID.x * 32 + tID.x);
int idx_y = (gID.y * 32 + tID.y);
float4 px = imgs.Load(int4(idx_x,idx_y,gID.z,0));
...
output[outIdx] = float3(px.x, px.y, px.z);
}
I know the render slots are working, as I can access each render slot by doing CopySubresourceRegion into a staging texture to view the byte data. However, this GPU->CPU
transfer is what I'm trying to avoid.
Also, I know the compute shader output is working as I can map the output buffer and inspect some basic test data.
What am I doing wrong? Thanks in advance.