How can I communicate to Metal that I am avoiding data conflicts between the GPU and the CPU

Question

So when it comes to graphics for iOS where a shared memory model regulates how memory is accessed in graphics applications buffering is an important concept.

The idea is that you buffer your data that is updating every frame that way the CPU is always writing to a different section of the buffer than the GPU is reading. You then wait for frames to complete their rendering before starting to write in a different section of the CPU buffer.

How to implement this is rather clear when talking about data that completely updates each frame. The question I have is how to do this for historical data. Imagine that I wanted to store the vertices of a trail for some object as it travels through the scene.

I would then have a sort of circular buffer keeping track of the last 120 frames of data that way the size was constant and I could just have the CPU write to a different part of the circular buffer each frame.

----- ----- ----- ----- ----- ----- -----
 n-3   n-2   n-1    n    n+1   n+2   n-4
----- ----- ----- ----- ----- ----- -----
                               ^CPU Write

In the above example for a given frame, where n represents the most recent part of the trial rendered, the CPU would write to the spot in the buffer labeled n+2 and the GPU would in two draw calls render n-3 -> n and n-4. While technically this would avoid a situation where the CPU and the GPU are messing with the same chunk of data at the same time I am worried about the communication of this.

My question essentially is. How can I communicate to Metal that I have ensured that a chunk of data will not be written by the CPU while the GPU attempts to read it? Is there something I need to be doing with alignment? Do they lock access to certain sizes of memory or something?

To make this question a bit more convoluted imagine I was storing the trails for 120 frames of movement for 1,000 different objects inside of one buffer. There are a few ways I could accomplish this by laying out data differently within the buffer. For example, I could have it like this

-----  -----  -----
 p1      p2    p3
-----  -----  -----

With each p block representing the 120 frame history for that particle and then I could apply the same concept of above with it having up to two draw calls to avoid drawing data currently being written.

Or I could lay it out like this

----- ----- ----- ----- ----- ----- -----
 n-3   n-2   n-1    n    n+1   n+2   n-4
----- ----- ----- ----- ----- ----- -----
                               ^CPU Write

Where inside of each of the n blocks the data for each particle is side by side.

To make things even more complicated I could avoid multiple draw calls altogether and open things up for alpha blending (sorting the triangle draw order) using an index buffer. The index buffer could indeed ensure that the CPU and the GPU don't technically need to wait on each other. But would they know?

Can I even achieve this optimization when I have index buffers happening? Or does that make memory access to unpredictable?

I realize this is a long write-up! The main questions are in bold. Essentially I just am wondering how/when the GPU and the CPU decide to wait when sharing data.

"I just am wondering how/when the GPU and the CPU decide to wait when sharing data." What makes you think they wait? — Ken Thomases
You can tell when looking at the graphics timeline debugger. If they are waiting on each other then it is a situation where the CPU will process a frame THEN the GPU will render a frame. The ideal situation is that the GPU renders frame n-2 WHILE the CPU processes frame n. — J.Doe
I think it must be something other than memory accesses which is causing that. — Ken Thomases
Correct it happens when the CPU tries to write the same bit of memory the GPU tries to read. I just cant tell if this is a predetermined wait or if they start waiting the instant something like that happens. — J.Doe

Columbo Columbo · Accepted Answer · 2018-12-27T21:24:18

How can I communicate to Metal that I have ensured that a chunk of data will not be written by the CPU while the GPU attempts to read it? Is there something I need to be doing with alignment? Do they lock access to certain sizes of memory or something?

You do not need to communicate any such information to the Metal API. Metal puts the burden on the programmer for ensuring that you do not modify data which the GPU has not yet consumed.

The typical approach in samples (e.g. the default iOS Metal Game project in XCode, or the sample described here) is to create a semaphore and define a MaxBuffersInFlight. The semaphore is waited on before filling out a command buffer and signalled in your frame command buffer completion handler.

Armed with the knowledge that only MaxBuffersInFlight will be in flight, it's up to you to manage whatever double/triple/quadruple/whatever buffering scheme will avoid writing data.

That you're not seeing any concurrency in the graphics timeline debugger (as you mention in a comment) is probably a separate bug. Perhaps you problems in your rendering loop code. Or maybe your CPU work is so trivial that it's always waiting on the GPU (or vice versa).

How can I communicate to Metal that I am avoiding data conflicts between the GPU and the CPU

1 Answers