I've some difficulty comprehending the terms used in the official documentation found here (https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview)
It says that preparing vertex array data can be something like given a list of 3D position data and UV texture coordinates, and an index list, It can generate a stream of vertices.
This stream of vertices are then required to be interpreted so that OpenGL can act on it. For example, a stream of 12 vertices can be interpreted as 4 separate triangles or 10 dependent triangles.
But to me it seems like the above step is combining many steps together, namely Vertex Shading and Primitive Assembly. Do these many steps really happen altogether?
For example, the documentation says in vertex shading that the vertices in the stream are transformed into an output vertex. This is so general and seems like something that is already done in the "first" step of preparing vertex array data.
Here is my very rudimentary idea of the whole pipeline:
Vertex data in the form of 3D positions are fed as a list input (a stream), and additionally a list of texture coordinates can also be input.
This stream of data is consumed and together with an index list, we arrange the vertices in order. The ordered list of vertices are then interpreted in a specific way (e.g. every 3 vertices = 1 primitive or every sequential 3 vertices = 1 primitive).
The vertices are then sent to a vertex shader, which also feeds in a list of vertices and computes attributes like vertex normals, before sending them as an output stream of vertices.
There is then the optional steps of computing extra primitives via tessalation shaders, or using a geometry shader that acts on tessalated primitives to create even more complex primitives.
Now we move on to Primitive Assembly, which is a separate process from the earlier stages of computing primitives. This step seeks to decompose the primitives into many basic primitives. E.g. a list of 12 vertices forming up ONE primitive is decomposed into 11 basic line primitives.
Then we move on to the clipping process, where we modify or delete all the vertices that do not lie within the view of the camera.
The next step is rasterization, where we generate fragments based on the stream of basic primitives that the rasterizer consumes. The fragments are much smaller than the basic primitives and are used to cover the pixels, instead of doing a computationally expensive pixel by pixel output (as in ray tracing). This produces an output of fragments that originating from basic primitives.
We send this stream of fragments to the fragment shader, which is similar in function to the vertex shader. It consumes this input stream of fragments and processes the colour of each fragment, looks up the colour in the texture (if any) and modify the pixel depth value.
What happens next after this? I read there's per-sample processing but it doesn't explain a lot.
My gaps in understanding:
How does the fragment work to "cover" the pixels in the screen eventually?
In step 8, do we really calculate the pixel depth value or are we just calculating the fragment depth value?
How does the output of each fragment know which pixels (relatively) it should cover?
In the rasterization, I suppose fragments generated are triangles. How are the triangles formed from the basic primitives?