Understanding some terminologies in OpenGL?

Question

I've some difficulty comprehending the terms used in the official documentation found here (https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview)

It says that preparing vertex array data can be something like given a list of 3D position data and UV texture coordinates, and an index list, It can generate a stream of vertices.
This stream of vertices are then required to be interpreted so that OpenGL can act on it. For example, a stream of 12 vertices can be interpreted as 4 separate triangles or 10 dependent triangles.

But to me it seems like the above step is combining many steps together, namely Vertex Shading and Primitive Assembly. Do these many steps really happen altogether?

For example, the documentation says in vertex shading that the vertices in the stream are transformed into an output vertex. This is so general and seems like something that is already done in the "first" step of preparing vertex array data.

Here is my very rudimentary idea of the whole pipeline:

Vertex data in the form of 3D positions are fed as a list input (a stream), and additionally a list of texture coordinates can also be input.
This stream of data is consumed and together with an index list, we arrange the vertices in order. The ordered list of vertices are then interpreted in a specific way (e.g. every 3 vertices = 1 primitive or every sequential 3 vertices = 1 primitive).
The vertices are then sent to a vertex shader, which also feeds in a list of vertices and computes attributes like vertex normals, before sending them as an output stream of vertices.
There is then the optional steps of computing extra primitives via tessalation shaders, or using a geometry shader that acts on tessalated primitives to create even more complex primitives.
Now we move on to Primitive Assembly, which is a separate process from the earlier stages of computing primitives. This step seeks to decompose the primitives into many basic primitives. E.g. a list of 12 vertices forming up ONE primitive is decomposed into 11 basic line primitives.
Then we move on to the clipping process, where we modify or delete all the vertices that do not lie within the view of the camera.
The next step is rasterization, where we generate fragments based on the stream of basic primitives that the rasterizer consumes. The fragments are much smaller than the basic primitives and are used to cover the pixels, instead of doing a computationally expensive pixel by pixel output (as in ray tracing). This produces an output of fragments that originating from basic primitives.
We send this stream of fragments to the fragment shader, which is similar in function to the vertex shader. It consumes this input stream of fragments and processes the colour of each fragment, looks up the colour in the texture (if any) and modify the pixel depth value.

What happens next after this? I read there's per-sample processing but it doesn't explain a lot.

My gaps in understanding:

How does the fragment work to "cover" the pixels in the screen eventually?
In step 8, do we really calculate the pixel depth value or are we just calculating the fragment depth value?
How does the output of each fragment know which pixels (relatively) it should cover?
In the rasterization, I suppose fragments generated are triangles. How are the triangles formed from the basic primitives?

I don't see in the doc you reference (which I think is the right doc to reference), where it says what you say it says. Can you provide a deeper link to something that communicates your first two bullets? Same for how you come to understand your bullet #7 further down in the question. — danh

Nico Schertler Nico Schertler · Accepted Answer · 2018-04-16T18:36:26

There are some slight inaccuracies in your understanding, which might be the reasons for your confusion. Let me try to correct them:

2.

At this point (before the vertex shader), there are no primitives yet. Every vertex is processed on its own, independent of what happens to them afterwards. If you have an index list, this will only tell the GPU what vertex data to process. But even then, there are no primitives yet. These will only be generated by the primitive assembly step.

5.

Primitive assembly creates the primitives from the stream of vertices (depending on your draw call parameters and possibly shader configurations). Basically, it is doing what you described in step 2. There are only three types of primitives (points, lines, triangles). Once you have an e.g. triangle, it is never decomposed into smaller or more basic triangles like you suggest. Therefore, primitive assembly is not doing much. It is just putting together the right vertex data into the right primitives (e.g. three subsequent vertices may make up one triangle).

7.

The fragments do not just cover the pixels by accident. They essentially are pixels. More accurately, they represent the part of a primitive that is represented by a given pixel on the screen. And the fragment shader is called for all those fragments / pixels. In the DirectX environment, this stage is called the pixel shader, which makes the connection even more explicit. E.g. a horizontal line that is 10 pixels wide will be rasterized into 10 fragments. There are no more triangles or other primitives beyond this stage.

Now to your questions. I believe many of them have already been answered above.

I believe this has been answered. You can take a look at some rasterization algorithms to get an idea of the implementation. The basic idea is to gather all pixels that are covered by a primitive.
It is the fragment's depth value. The pipeline configuration decides what happens to this value. E.g. if you have standard depth testing, the pipeline will compare the fragment's depth value to the current pixel's depth value and update the pixel's depth value if it is greater than the fragment's depth value (i.e. if the fragment is in front of everything else that projects onto this pixel).
This should be clear from the above exposition. Fragments have a direct correspondence to pixels.
Should also be answered. They are not triangles.

All the above assumes some basic rendering setup for clarity of exposition. More advanced techniques like multisampling etc. may require slight adaptation of the above statements.

Understanding some terminologies in OpenGL?

1 Answers

2.

5.

7.