I try to implement a spin lock in a compute shader. But my implementation it doesn't seems to lock anything.
Here is how I implement the spin lock:
void LockAcquire()
{
uint Value = 1;
[allow_uav_condition]
while (Value) {
InterlockedCompareExchange(DataOutBuffer[0].Lock, 0, 1, Value);
};
}
void LockRelease()
{
uint Value;
InterlockedExchange(DataOutBuffer[0].Lock, 0, Value);
}
Background: I need a spin lock because I have to compute a sum of data in a large 2 dimensions array. The sum is a double. Computing the sum with a single thread and a double loop produce the correct result. Computing the sum with multithreading produce the wrong result, even when introducing the spin lock to avoid conflict when computing the sum.
I can't use InterLockedAdd because the sum doesn't fit into a 32 bits integer and I'm using a shader model 5 (Compiler 47).
Here is the single thread version, producing the correct result:
[numthreads(1, 1, 1)]
void CSGrayAutoComputeSumSqr(
uint3 Gid : SV_GroupID,
uint3 DTid : SV_DispatchThreadID, // Coordinates in RawImage window
uint3 GTid : SV_GroupThreadID,
uint GI : SV_GroupIndex)
{
if ((DTid.x == 0) && (DTid.y == 0)) {
uint2 XY;
int Mean = (int)round(DataOutBuffer[0].GrayAutoResultMean);
for (XY.x = 0; XY.x < (uint)RawImageSize.x; XY.x++) {
for (XY.y = 0; XY.y < (uint)RawImageSize.y; XY.y++) {
int Value = GetPixel16BitGrayFromRawImage(RawImage, rawImageSize, XY);
uint UValue = (Mean - Value) * (Mean - Value);
DataOutBuffer[0].GrayAutoResultSumSqr += UValue;
}
}
}
}
and below is the multithreaded version. This version produce similar but different result on each execution which IMO is caused by the not functioning lock.
[numthreads(1, 1, 1)]
void CSGrayAutoComputeSumSqr(
uint3 Gid : SV_GroupID,
uint3 DTid : SV_DispatchThreadID, // Coordinates in RawImage window
uint3 GTid : SV_GroupThreadID,
uint GI : SV_GroupIndex)
{
int Value = GetPixel16BitGrayFromRawImage(RawImage, RawImageSize, DTid.xy);
int Mean = (int)round(DataOutBuffer[0].GrayAutoResultMean);
uint UValue = (Mean - Value) * (Mean - Value);
LockAcquire();
DataOutBuffer[0].GrayAutoResultSumSqr += UValue;
LockRelease();
}
Data used:
cbuffer TImageParams : register(b0)
{
int2 RawImageSize; // Actual image size in RawImage
}
struct TDataOutBuffer
{
uint Lock; // Use for SpinLock
double GrayAutoResultMean;
double GrayAutoResultSumSqr;
};
ByteAddressBuffer RawImage : register(t0);
RWStructuredBuffer<TDataOutBuffer> DataOutBuffer : register(u4);
Dispatch code:
FImmediateContext->CSSetShader(FComputeShaderGrayAutoComputeSumSqr, NULL, 0);
FImmediateContext->Dispatch(FImageParams.RawImageSize.X, FImageParams.RawImageSize.Y, 1);
The function GetPixel16BitGrayFromRawImage access RawImage byte address buffer to fetch 16 bit pixel value from a gray scale image. It produce the expected result.
Any help appreciated.