I really can't figure out what's going on here.
I have a compute shader that takes in an FFT result (from real input) and computes the powers of each bin, storing them in a different buffer (UAV). The FFT implementation is that of the D3DCSX library.
The shader in question:
struct Complex {
float real;
float imag;
};
RWStructuredBuffer<Complex> g_result : register(u0);
RWStructuredBuffer<float> g_powers : register(u1);
[numthreads(1, 1, 1)] void main(uint3 id : SV_DispatchThreadID) {
const uint bin = id.x;
const float real = g_result[bin + 1].real;
const float imag = g_result[bin + 1].imag;
const float power = real * real + imag * imag;
const float mag = sqrt(power);
const float db = 10.0f * log10(1.0f + power);
g_powers[bin] = power;
}
The buffer creation code:
//The buffer in which the resulting powers are stored (m_result_buffer1)
buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;
hr = m_device->CreateBuffer (
&buffer_desc,
nullptr,
&m_result_buffer1
); HR_THROW();
//UAV for m_result_buffer1
view_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
view_desc.Buffer.FirstElement = 0;
view_desc.Format = DXGI_FORMAT_R32_TYPELESS;
view_desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
view_desc.Buffer.NumElements = NumBins();
hr = m_device->CreateUnorderedAccessView (
m_result_buffer1,
&view_desc,
&m_result_view
); HR_THROW();
//Buffer for reading powers to the CPU
buffer_desc.BindFlags = 0;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
buffer_desc.MiscFlags = 0;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_STAGING;
hr = m_device->CreateBuffer (
&buffer_desc,
nullptr,
&m_result_buffer2
); HR_THROW();
The dispatch code:
CComPtr<ID3D11UnorderedAccessView> result_view;
hr = m_fft->ForwardTransform (
m_sample_view,
&result_view
); HR_THROW();
ID3D11UnorderedAccessView* views[] = {
result_view, //FFT UAV (u0)
m_result_view //Power UAV (u1)
};
m_context->CSSetShader(m_power_cs, nullptr, 0);
m_context->CSSetUnorderedAccessViews(0, 2, views, nullptr);
m_context->Dispatch(NumBins(), 1, 1);
And finally the CPU mapping code:
m_context->CopyResource(m_result_buffer2, m_result_buffer1);
D3D11_MAPPED_SUBRESOURCE sub = { 0 };
m_context->Map(m_result_buffer2, 0, D3D11_MAP_READ, 0, &sub);
memcpy(result, sub.pData, sizeof(float) * NumBins());
m_context->Unmap(m_result_buffer2, 0);
What happens is this shader appears to have every thread write to the same index in the output buffer. The mapped buffer always reads a correct value for the first bin, then 0.0f for every other bin. The equivalent code on the CPU runs just fine. What's weird is I've placed conditionals and know that bin
is not just 0 all the time, and that the power of every bin outside bin 0 is also not always 0.0f. I've also tried writing to multiple bins on the same thread using a for loop, and the same thing happens. What am I doing wrong?
I have a hunch that it's the buffer creation code or mapping code that's at the root of the problem. I know I'm running the correct number of threads on the GPU and that the dispatch ID's are correct, it's the CPU-side result that's wrong.