8
votes

I'm building an app rendering 2D geometry in Metal.

Right now, the positions of the vertices are solved from within the vertex function. What I'd like is to write the solved positions back to a buffer from inside that same vertex function.

I'm under the impression that this is possible although in my first attempt to do it i.e.:

vertex VertexOut basic_vertex(device VertexIn *vertices [[ buffer(0) ]],
                              device VertexOut *solvedVertices [[ buffer(1) ]],
                              vid [[ vertex_id ]])
{
    VertexIn in vertices[vid];
    VertexOut out;
    out.position = ... // Solve the position of the vertex 

    solvedVertices[vid] = out // Write to the buffer later to be read by CPU

    return out;
}

I was graced with the presence of this compile time error:

enter image description here

Okay, so a few solutions come to my head - I could solve for the vertex positions in a first - non-rasterizing - pass through a vertex function declared like:

vertex void solve_vertex(device VertexIn *unsolved [[ buffer(0) ]],
                         device VertexOut *solved [[ buffer(1) ]],
                         vid [[ vertex_id ]])
{
    solved[vid] = ... 
}

And then pipe those solved vertices into a now much simpler - rasterizing - vertex function.

Another solution that could work but seems less appealing could be to solve them in a compute function.

So, what is the best way forward in a situation like this? From my little bits of research, I could track down that this same sort of procedure is done in Transform Feedback but I've had no luck (other than the link at the begging of the question) finding examples in Apple's documentation/sample code or elsewhere on the web for best practices when facing this sort of problem.

2
instead of device you could write to constant buffers. - gpu3d
@Marius constant address space is read-only. See here - Functions, Variables, and Qualifiers - jameslintaylor
Metal doesn't exactly have transform feedback as it is normally understood; what you've discovered (using a non-rasterizing pipeline with a vertex function that writes to a buffer) is the closest thing. The compute version would do essentially the same work and would be pretty easy to write; I'd try both and see which gives you better performance. I'm not posting this as an answer because (a) I don't have empirical evidence in favor of either approach and (b) I'm hoping you self-answer once you've investigated this :) - warrenm
Okay, thanks for the feedback again @warrenm. As you advised, I'll self answer this once I've investigated different solutions. - jameslintaylor
@jameslintaylor did you find out if using a compute pipeline was faster/slower than the render pipeline? - TJez

2 Answers

7
votes

Alright, it turns out using a non-rasterizing vertex function is the way to go. There are some things to note however for others future reference:

A non-rasterizing vertex function is simply a vertex function returning void i.e.:

vertex void non_rasterizing_vertex(...) { }

When executing a non-rasterizing "render" pass, the MTLRenderPassDescriptor still needs to have a texture set - for instance in MTLRenderPassDescriptor's colorAttachments[0].texture - for reasons I don't know (I assume it's just due to the fixed nature of GPU programming).

The MTLRenderPipelineState needs to have it's rasterizationEnabled property set to false, then you can assign the non-rasterizing vertex function to it's vertexFunction property. The fragmentFunction property can remain nil as expected.

When actually executing the pass, one of the drawPrimitives: methods (the naming of which may be misleading) still needs to be invoked on the configured MTLRenderCommandEncoder. I ended up with a call to render MTLPrimitiveType.Points since that seems the most sensical.

Doing all of this sets up "rendering" logic ready to write back to vertex buffers from the vertex function - so long as they're in device address space:

vertex void non_rasterizing_vertex(device float *writeableBuffer [[ buffer(0) ]],
                                   uint vid [[ vertex_id ]])
{
    writeableBuffer[vid] = 42; // Write away!
}

This "answer" ended up more like a blog post but I hope it remains useful for future reference.

TODO

I'd still like to investigate performance tradeoffs between doing compute-y work like this in a compute pipeline versus in the rendering pipeline like above. Once I have some more time to do that, I'll update this answer.

0
votes

The correct solution is to move any code writing to buffers to a compute kernel.

You will loose a great deal of performance writing to buffers in a vertex function. It is optimized for rasterizing, not for computation.

You just need to use a compute command encoder.

guard let computeBuffer = commandQueue.makeCommandBuffer() else { return }
guard let computeEncoder = computeBuffer.makeComputeCommandEncoder() else { return }
computeEncoder.setComputePipelineState(solveVertexPipelineState)


kernel void solve_vertex(device VertexIn *unsolved [[ buffer(0) ]],
                     device VertexOut *solved [[ buffer(1) ]],
                     vid [[ instance ]])
{
    solved[vid] = ...
}