I'm really new to DirectCompute technologies, and have been attempting to learn from the documentation on the msdn website, which is.. dense, to say the least.
I'd like to make a basic hlsl file that takes in a 4x4 matrix and a 4xN matrix and returns the multiplied result. But after spending some time playing with the code, I've found some weird stuff I don't understand - mainly with how the threads I pass in process the buffers and output data.
With all of these examples, I pass in two 16 float buffers and get out a 16 float buffer and then Dispatch with a 4x1x1 grouping - I can show you code, but I honestly dont yet know what would help you help me. Let me know if there's a section of my C++ code you want to see.
with the following code:
StructuredBuffer<float4x4> base_matrix : register(t0); // byteWidth = 64
StructuredBuffer<float4> extended_matrix : register(t1); // byteWidth = 64
RWStructuredBuffer<float4> BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
BufferOut[DTid.x].x = 1;
}
I get the following values out:
1.000 0.000 0.000 0.000
1.000 0.000 0.000 0.000
1.000 0.000 0.000 0.000
1.000 0.000 0.000 0.000
This makes sense to me - the buffer is parsed as 4 threads, each executing 1 float4 grouping.
with the following code:
StructuredBuffer<float4x4> base_matrix : register(t0); // byteWidth = 64
StructuredBuffer<float4> extended_matrix : register(t1); // byteWidth = 64
RWStructuredBuffer<float4> BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
BufferOut[DTid.x].x = 1;
BufferOut[DTid.x].y = 2;
BufferOut[DTid.x].z = 3;
BufferOut[DTid.x].w = 4;
}
I get the following values out:
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
and with the actual code I want to run:
StructuredBuffer<float4x4> base_matrix : register(t0);
StructuredBuffer<float4> extended_matrix : register(t1);
RWStructuredBuffer<float4> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
BufferOut[DTid.x] = mul(base_matrix[0],extended_matrix[DTid.x])
}
I get the following values out:
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
I can tell I'm missing a critical thing here, but for the life of me I cant find the appropriate documentation telling me how these work. Could someone help me understand whats going on in this code?
Thanks for your time,
Zach
As another note, this code was cribbed together using the Microsoft DirectX SDK (June 2010)\Samples\C++\Direct3D11\BasicCompute11 Sample available. If I'm doing something terribly wrong, feel free to let me know. I'm REALLY new at HLSL.
Edit: My buffer creation code.
CreateStructuredBuffer( g_pDevice, sizeof(float)*16, 1, g_matrix, &g_pBuf0 );
CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, g_extended_matrix, &g_pBuf1 );
CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, NULL, &g_pBufResult );
//--------------------------------------------------------------------------------------
// Create Structured Buffer
//--------------------------------------------------------------------------------------
HRESULT CreateStructuredBuffer( ID3D11Device* pDevice, UINT uElementSize, UINT uCount, VOID* pInitData, ID3D11Buffer** ppBufOut )
{
*ppBufOut = NULL;
D3D11_BUFFER_DESC desc;
ZeroMemory( &desc, sizeof(desc) );
desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
desc.ByteWidth = uElementSize * uCount;
desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
desc.StructureByteStride = uElementSize;
if ( pInitData )
{
D3D11_SUBRESOURCE_DATA InitData;
InitData.pSysMem = pInitData;
return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
} else
return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}
Trying .1,.2,.3,.4 ...
StructuredBuffer<float4x4> base_matrix : register(t0);
StructuredBuffer<float4> extended_matrix : register(t1);
StructuredBuffer<uint> loop_multiplier : register(t2);
RWStructuredBuffer<float4> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
BufferOut[DTid.x].x = .1;
BufferOut[DTid.x].y = .2;
BufferOut[DTid.x].z = .3;
BufferOut[DTid.x].w = .4;
}
got this out:
0.100 0.100 0.100 0.100
0.100 0.100 0.100 0.100
0.100 0.100 0.100 0.100
0.100 0.100 0.100 0.100