DX11 Compute Shader writes only to one index

Question

I really can't figure out what's going on here.

I have a compute shader that takes in an FFT result (from real input) and computes the powers of each bin, storing them in a different buffer (UAV). The FFT implementation is that of the D3DCSX library.

The shader in question:

struct Complex {
    float real;
    float imag;
};

RWStructuredBuffer<Complex> g_result : register(u0);
RWStructuredBuffer<float> g_powers : register(u1);

[numthreads(1, 1, 1)] void main(uint3 id : SV_DispatchThreadID) {
    const uint  bin  = id.x;
    const float real = g_result[bin + 1].real;
    const float imag = g_result[bin + 1].imag;

    const float power = real * real + imag * imag;
    const float mag = sqrt(power);
    const float db = 10.0f * log10(1.0f + power);

    g_powers[bin] = power;
}

The buffer creation code:

//The buffer in which the resulting powers are stored (m_result_buffer1)
buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer1
); HR_THROW();

//UAV for m_result_buffer1
view_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
view_desc.Buffer.FirstElement = 0;
view_desc.Format = DXGI_FORMAT_R32_TYPELESS;
view_desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
view_desc.Buffer.NumElements = NumBins();

hr = m_device->CreateUnorderedAccessView (
    m_result_buffer1,
    &view_desc,
    &m_result_view
); HR_THROW();

//Buffer for reading powers to the CPU
buffer_desc.BindFlags = 0;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
buffer_desc.MiscFlags = 0;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_STAGING;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer2
); HR_THROW();

The dispatch code:

CComPtr<ID3D11UnorderedAccessView> result_view;

hr = m_fft->ForwardTransform (
    m_sample_view,
    &result_view
); HR_THROW();

ID3D11UnorderedAccessView* views[] = {
    result_view,  //FFT UAV   (u0)
    m_result_view //Power UAV (u1)
};

m_context->CSSetShader(m_power_cs, nullptr, 0);
m_context->CSSetUnorderedAccessViews(0, 2, views, nullptr);
m_context->Dispatch(NumBins(), 1, 1);

And finally the CPU mapping code:

m_context->CopyResource(m_result_buffer2, m_result_buffer1);

D3D11_MAPPED_SUBRESOURCE sub = { 0 };

m_context->Map(m_result_buffer2, 0, D3D11_MAP_READ, 0, &sub);
memcpy(result, sub.pData, sizeof(float) * NumBins());
m_context->Unmap(m_result_buffer2, 0);

What happens is this shader appears to have every thread write to the same index in the output buffer. The mapped buffer always reads a correct value for the first bin, then 0.0f for every other bin. The equivalent code on the CPU runs just fine. What's weird is I've placed conditionals and know that bin is not just 0 all the time, and that the power of every bin outside bin 0 is also not always 0.0f. I've also tried writing to multiple bins on the same thread using a for loop, and the same thing happens. What am I doing wrong?

I have a hunch that it's the buffer creation code or mapping code that's at the root of the problem. I know I'm running the correct number of threads on the GPU and that the dispatch ID's are correct, it's the CPU-side result that's wrong.

NmdMystery NmdMystery · Accepted Answer · 2015-02-28T22:31:58

Problem Solved!

I was using a RWStructuredBuffer to represent a RWByteOrderBuffer. Not entirely sure how that led to this result, but it did. So, the FFT result is now a RWByteOrderBuffer. What was strange about this buffer, though, was the fact that the D3DCSX implementation spaced the float values so far apart - possibly for cache reasons, but I'm honestly not too sure why. This is my compute shader now (computing decibels instead of powers this time - an unrelated change):

RWByteAddressBuffer       g_result   : register(u0);
RWStructuredBuffer<float> g_decibels : register(u1);

[numthreads(256, 1, 1)] void main(uint3 id : SV_DispatchThreadID) {
    const float real = asfloat(g_result.Load(id.x * 8 + 0));
    const float imag = asfloat(g_result.Load(id.x * 8 + 4));

    const float power = real * real + imag * imag;
    const float db = 10.0f * log10(1.0f + power);

    g_decibels[id.x] = db;
}

I changed my decibel buffer's description to that of a structured buffer, though, just to make things easier for me:

buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer1
); HR_THROW();

view_desc.Buffer.FirstElement = 0;
view_desc.Buffer.Flags = 0;
view_desc.Buffer.NumElements = NumBins();
view_desc.Format = DXGI_FORMAT_UNKNOWN;
view_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;

hr = m_device->CreateUnorderedAccessView (
    m_result_buffer1,
    &view_desc,
    &m_result_view
); HR_THROW();

This is why g_decibels is still a RWStructuredBuffer.

Still unknown to me is whether or not it matters that the result buffer is read/write when only accesses are necessary - if I change g_result to a regular ByteOrderBuffer I get no output. But at least it's working now.

DX11 Compute Shader writes only to one index

1 Answers