I'm implementing an A-Buffer in Metal for Mac, and it is almost working -- except that I am seeing shimmering glitches wherever triangles overlap. It seems like the buffers involved may not be updating at the correct times. But I don't know what could cause it. Here's a picture -- the 'corrupted' area changes every frame but is always where the two colors overlap.
I won't explain the whole A-Buffer operation, but it involves binding three buffers to the shader: one is very large (172MB, although only a small part of it is written to for this example). There is also a "texture" of integers and a single integer atomic counter.
The rendering is done in two passes -- the first pass creates a linked-list of pixel fragments for every visible rendered pixel location:
// the uint return goes into the start index buffer, our 'image'. The FragLinkBuffer stores the data
fragment uint stroke_abuffer_fragment(VertexIn interpolated [[stage_in]],
const device uint& color [[ buffer(0) ]],
device FragLink* LinkBuffer [[ buffer(1) ]],
device atomic_uint &counter[[buffer(2)]],
texture2d<uint> StartTexture [[ texture(0) ]]) {
constexpr sampler Sampler(coord::pixel,filter::nearest);
// get old start position for this pixel from from start buffer
uint value = atomic_fetch_add_explicit(&counter, 1, memory_order_relaxed);
// store pointer to this position in the start buffer
int oldStart = StartTexture.sample(Sampler, interpolated.position.xy).x;
// store fragment information in link buffer
FragLink F;
F.color = color;
F.depth = interpolated.position.z;
F.next = oldStart;
LinkBuffer[value] = F;
// return pointer to new start for this fragment, which will be stored back to the StartTexture
return value;
}
The second pass sorts and blends the fragments at each pixel.
#define MAX_PIXELS 16
fragment float4 stroke_abuffer_fragment_composite(CompositeVertexOut interpolated [[stage_in]],
device FragLink* LinkBuffer [[ buffer(0) ]],
texture2d<uint> StartTexture [[ texture(0) ]]) {
pixel SortedPixels[MAX_PIXELS];
int numPixels = 0;
constexpr sampler Sampler(coord::pixel,filter::nearest);
FragLink F;
pixel P;
uint index = StartTexture.sample(Sampler, interpolated.position.xy).x;
if (index == 0)
discard_fragment();
float4 finalColor = float4(0.0);
// grab all the linked fragments for this pixel
while (index != 0) {
F = LinkBuffer[index];
P.color = F.color;
P.depth = F.depth;
SortedPixels[numPixels++] = P;
index = (numPixels >= MAX_PIXELS) ? 0 : F.next;
}
// now sort them by depth
for (int j = 1; j < numPixels; ++j) {
pixel key = SortedPixels[j];
int i = j - 1;
while (i >= 0 && SortedPixels[i].depth <= key.depth)
{
SortedPixels[i+1] = SortedPixels[i];
--i;
}
SortedPixels[i+1] = key;
}
// blend them in order
for (int k = 0; k < numPixels; k++) {
uint color = SortedPixels[k].color;
float red = ((color>>24)&255)/255.0;
float green = ((color>>16)&255)/255.0;
float blue = ((color>>8)&255)/255.0;
float alpha = ((color)&255)/255.0;
//red = 1.0; green = 0.0; blue = 0.0; alpha = 0.25;
finalColor.xyz = mix(finalColor.xyz, float3(red,green,blue), alpha);
finalColor.w = alpha;
}
return finalColor;
}
I'm just wondering what might be the cause of this behavior. If I check the values of the buffers at each frame, by blitting their contents back to CPU memory and printing values, they are changing every frame, when they should be the same.
The results are the same whether or not I call commandBuffer.waitUntilCompleted() after each frame's call to commandBuffer.commit(). By calling waitUntilCompleted, shouldn't I eliminate any issues relating to one frame's use of the buffer while the next frame is also trying to access it? (Because I thought perhaps I would need to triple buffer that 172MB buffer which would be horrible.)
I'm doing the entire render -- the initial blit to reset the counter, the first rendering pass, and then the second rendering pass, all as one commandBuffer call. Would that be a problem? In other words, do I need to actually commit the first rendering pass, wait for it to complete, and then initiate the second? (EDIT: I tried this and it did not change anything)
The original technique I am porting (https://www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists) does not use OpenGL blending in the second stage -- they bind the background as a texture buffer and blend it in manually along with the pixel fragments, then return the complete result. I just decided to skip this and blend my final combined fragment color with the background using normal 'over' blending. But I don't see why this would cause the problem I'm having. I will try it their way just in case...
I greatly appreciate any ideas about what would cause this! Thanks
. . .
UPDATE: Following the conversation in the comments I've updated the shaders to use an atomic buffer instead of a texture, but am now getting "Execution of the command buffer was aborted due to an error during execution. Internal Error (IOAF code 1)":
fragment void stroke_abuffer_fragment(VertexIn interpolated [[stage_in]],
const device uint& color [[ buffer(0) ]],
constant Uniforms& uniforms [[ buffer(1) ]],
device FragLink* LinkBuffer [[ buffer(3) ]],
device atomic_uint &counter[[buffer(2)]],
device atomic_uint *StartBuffer[[buffer(4)]]
) {
uint pos = int(interpolated.position.x)+int(interpolated.position.y)*uniforms.displaySize[0];
// get counter value -- the index to next spot in link buffer
uint value = atomic_fetch_add_explicit(&counter, 1, memory_order_relaxed);
value += 1;
// store fragment information in link buffer
FragLink F;
F.color = color;
F.depth = interpolated.position.z;
F.next = atomic_exchange_explicit(&StartBuffer[pos], value, memory_order_relaxed);
LinkBuffer[value] = F;
}