So when it comes to graphics for iOS where a shared memory model regulates how memory is accessed in graphics applications buffering is an important concept.
The idea is that you buffer your data that is updating every frame that way the CPU is always writing to a different section of the buffer than the GPU is reading. You then wait for frames to complete their rendering before starting to write in a different section of the CPU buffer.
How to implement this is rather clear when talking about data that completely updates each frame. The question I have is how to do this for historical data. Imagine that I wanted to store the vertices of a trail for some object as it travels through the scene.
I would then have a sort of circular buffer keeping track of the last 120 frames of data that way the size was constant and I could just have the CPU write to a different part of the circular buffer each frame.
----- ----- ----- ----- ----- ----- -----
n-3 n-2 n-1 n n+1 n+2 n-4
----- ----- ----- ----- ----- ----- -----
^CPU Write
In the above example for a given frame, where n represents the most recent part of the trial rendered, the CPU would write to the spot in the buffer labeled n+2 and the GPU would in two draw calls render n-3 -> n and n-4. While technically this would avoid a situation where the CPU and the GPU are messing with the same chunk of data at the same time I am worried about the communication of this.
My question essentially is. How can I communicate to Metal that I have ensured that a chunk of data will not be written by the CPU while the GPU attempts to read it? Is there something I need to be doing with alignment? Do they lock access to certain sizes of memory or something?
To make this question a bit more convoluted imagine I was storing the trails for 120 frames of movement for 1,000 different objects inside of one buffer. There are a few ways I could accomplish this by laying out data differently within the buffer. For example, I could have it like this
----- ----- -----
p1 p2 p3
----- ----- -----
With each p block representing the 120 frame history for that particle and then I could apply the same concept of above with it having up to two draw calls to avoid drawing data currently being written.
Or I could lay it out like this
----- ----- ----- ----- ----- ----- -----
n-3 n-2 n-1 n n+1 n+2 n-4
----- ----- ----- ----- ----- ----- -----
^CPU Write
Where inside of each of the n blocks the data for each particle is side by side.
To make things even more complicated I could avoid multiple draw calls altogether and open things up for alpha blending (sorting the triangle draw order) using an index buffer. The index buffer could indeed ensure that the CPU and the GPU don't technically need to wait on each other. But would they know?
Can I even achieve this optimization when I have index buffers happening? Or does that make memory access to unpredictable?
I realize this is a long write-up! The main questions are in bold. Essentially I just am wondering how/when the GPU and the CPU decide to wait when sharing data.