What's the best way to draw a fullscreen quad in OpenGL 3.2?

votes

I'm doing ray casting in the fragment shader. I can think of a couple ways to draw a fullscreen quad for this purpose. Either draw a quad in clip space with the projection matrix set to the identity matrix, or use the geometry shader to turn a point into a triangle strip. The former uses immediate mode, deprecated in OpenGL 3.2. The latter I use out of novelty, but it still uses immediate mode to draw a point.

openglglsl

The geometry shader to generate a quad from a point sounds like overkill if you just need a single quad. Just draw two triangles or a triangle strip. Those four vertices won't hurt you, at least not as hard as a special geometry shader for something that simple. – Christian Rau

6 Answers

votes

You can send two triangles creating a quad, with their vertex attributes set to -1/1 respectively.

You do not need to multiply them with any matrix in the vertex/fragment shader.

Here are some code samples, simple as it is :)

Vertex Shader:

const vec2 madd=vec2(0.5,0.5);
attribute vec2 vertexIn;
varying vec2 textureCoord;
void main() {
   textureCoord = vertexIn.xy*madd+madd; // scale vertex attribute to [0-1] range
   gl_Position = vec4(vertexIn.xy,0.0,1.0);
}

Fragment Shader :

varying vec2 textureCoord;
void main() {
   vec4 color1 = texture2D(t,textureCoord);
   gl_FragColor = color1;
}

votes

I'm going to argue that the most efficient approach will be in drawing a single "full-screen" triangle. For a triangle to cover the full screen, it needs to be bigger than the actual viewport. In NDC (and also clip space, if we set w=1), the viewport will always be the [-1,1] square. For a triangle to cover this area just completely, we need to have two sides to be twice as long as the viewport rectangle, so that the third side will cross the edge of the viewport, hence we can for example use the following coordiates (in counter-clockwise order): (-1,-1), (3,-1), (-1,3).

We also do not need to worry about the texcoords. To get the usual normalized [0,1] range across the visible viewport, we just need to make the corresponding texcoords for the vertices tiwce as big, and the barycentric interpolation will yield exactly the same results for any viewport pixel as when using a quad.

This approach can of course be combined with attribute-less rendering as suggested in demanze's answer:

out vec2 texcoords; // texcoords are in the normalized [0,1] range for the viewport-filling quad part of the triangle
void main() {
        vec2 vertices[3]=vec2[3](vec2(-1,-1), vec2(3,-1), vec2(-1, 3));
        gl_Position = vec4(vertices[gl_VertexID],0,1);
        texcoords = 0.5 * gl_Position.xy + vec2(0.5);
}

Why will a single triangle be more efficient?

This is not about the one saved vertex shader invocation, and the one less triangle to handle at the front-end. The most significant effect of using a single trianlge will be that there are less fragment shader invocations

Real GPUs always invoke the fragment shader for 2x2 pixel sized blocks ("quads") as soon as a single pixel of the primitive falls into such a block. This is necessary for calculating the window-space derivative functions (those are also implicitly needed for texture sampling, see this question).

If the primitive does not cover all 4 pixels in that block, the remaining fragment shader invocations will do no useful work (apart from providing the data for the derivative calcualtions) and will be so-called helper invocations (which can even be queried via the gl_HelperInvocation GLSL function). See also Fabian "ryg" Giesen's blog article for more details.

If you render a quad with two triangles, both will have one edge going diagonally across the viewport, and on both triangles, you will generate a lot of useless helper invocations at the diagonal edge. The effect will be worst for a perfectly square viewport (aspect ratio 1). If you draw a single triangle, there will be no such diagonal edge (it lies outside of the viewport and won't concern the rasterizer at all), so there will be no additional helper invocations.

Wait a minute, if the triangle extends across the viewport boundaries, won't it get clipped and actually put more work on the GPU?

If you read the textbook materials about graphics pipelines (or even the GL spec), you might get that impression. But real-world GPUs use some different approaches like Guard-band clipping. I won't go into detail here (that would be a topic on it's own, have a look at Fabian "ryg" Giesen's fine blog article for details), but the general idea is that the rasterizer will produce fragments only for pixels inside the viewport (or scissor rect) anyway, no matter if the primitive lies completely inside it or not, so we can simply throw bigger triangles at it if both of the following are true:

a) the triangle does only extend the 2D top/bottom/left/right clipping planes (as opposed to the z-Dimension near/far ones, which are more tricky to handle, especially because vertices may also lie behind the camera)
b) the actual vertex coordinates (and all intermediate calculation results the rasterizer might be doing on them) are representable in the internal data formats the GPU's hardware rasterizer uses. The rasterizer will use fixed-point data types of implementation-specific width, while vertex coords are 32Bit single precision floats. (That is bascially what defines the size of the Guard-band)

Our tiranlge is only factor 3 bigger than the viewport, so we can be very sure that there is no need to clip it at all.

But is it worth it?

Well, the savings on fragment shader invocations are real (especially when you have a complex fragment shader), but the overall effect might be barely measurable in a real-world scenario. On the other hand, the approach is not more complicated than using a full-screen quad, and uses less data, so even if might not make a huge difference, it won't hurt, so why not using it?

Could this approach be used for all sorts of axis-aligned rectangles, not just fullscreen ones?

In theory, you can combine this with the scissor test to draw some arbitrary axis-aligned rectangle (and the scissor test will be very efficient, as it just limits which fragments are produced in the first place, it isn't a real "test" in HW which discards fragments). However, this requires you to change the scissor parameters for each rectangle you want to draw, which implies a lot of state changes and limits you to a single rectangle per draw call, so doing so won't be a good idea in most scenarios.

votes

To output a fullscreen quad geometry shader can be used:

#version 330 core

layout(points) in;
layout(triangle_strip, max_vertices = 4) out;

out vec2 texcoord;

void main() 
{
    gl_Position = vec4( 1.0, 1.0, 0.5, 1.0 );
    texcoord = vec2( 1.0, 1.0 );
    EmitVertex();

    gl_Position = vec4(-1.0, 1.0, 0.5, 1.0 );
    texcoord = vec2( 0.0, 1.0 ); 
    EmitVertex();

    gl_Position = vec4( 1.0,-1.0, 0.5, 1.0 );
    texcoord = vec2( 1.0, 0.0 ); 
    EmitVertex();

    gl_Position = vec4(-1.0,-1.0, 0.5, 1.0 );
    texcoord = vec2( 0.0, 0.0 ); 
    EmitVertex();

    EndPrimitive(); 
}

Vertex shader is just empty:

#version 330 core

void main()
{
}

To use this shader you can use dummy draw command with empty VBO:

glDrawArrays(GL_POINTS, 0, 1);

votes

No need to use a geometry shader, a VBO or any memory at all.

A vertex shader can generate the quad.

layout(location = 0) out vec2 uv;

void main() 
{
    float x = float(((uint(gl_VertexID) + 2u) / 3u)%2u); 
    float y = float(((uint(gl_VertexID) + 1u) / 3u)%2u); 

    gl_Position = vec4(-1.0f + x*2.0f, -1.0f+y*2.0f, 0.0f, 1.0f);
    uv = vec2(x, y);
}

Bind an empty VAO. Send a draw call for 6 vertices.

votes

The following comes from the draw function of the class that draws fbo textures to a screen aligned quad.

Gl.glUseProgram(shad);      

Gl.glBindBuffer(Gl.GL_ARRAY_BUFFER, vbo);           
Gl.glEnableVertexAttribArray(0);
Gl.glEnableVertexAttribArray(1);
Gl.glVertexAttribPointer(0, 3, Gl.GL_FLOAT, Gl.GL_FALSE, 0, voff);
Gl.glVertexAttribPointer(1, 2, Gl.GL_FLOAT, Gl.GL_FALSE, 0, coff);  

Gl.glActiveTexture(Gl.GL_TEXTURE0);
Gl.glBindTexture(Gl.GL_TEXTURE_2D, fboc);
Gl.glUniform1i(tileLoc, 0);

Gl.glDrawArrays(Gl.GL_QUADS, 0, 4);

Gl.glBindTexture(Gl.GL_TEXTURE_2D, 0);
Gl.glBindBuffer(Gl.GL_ARRAY_BUFFER, 0); 

Gl.glUseProgram(0);

The actual quad itself and the coords are got from:

private float[] v=new float[]{  -1.0f, -1.0f, 0.0f,
                                1.0f, -1.0f, 0.0f,
                                1.0f, 1.0f, 0.0f,
                                -1.0f, 1.0f, 0.0f,

                                0.0f, 0.0f,
                                1.0f, 0.0f,
                                1.0f, 1.0f,
                                0.0f, 1.0f
};

The binding and set up of the vbo's I leave to you.

The vertex shader:

#version 330

layout(location = 0) in vec3 pos;
layout(location = 1) in vec2 coord;

out vec2 coords;

void main() {
    coords=coord.st;
    gl_Position=vec4(pos, 1.0);
}

Because the position is raw, that is, not multiplied by any matrix the -1, -1::1, 1 of the quad fit into the viewport. Look for Alfonse's tutorial linked off any of his posts on openGL.org.

votes

This is similar to the answer by demanze, but I would argue it's easier to understand. Also this is only drawn with 4 vertices by using TRIANGLE_STRIP.

#version 300 es
out vec2 textureCoords;

void main() {
    const vec2 positions[4] = vec2[](
        vec2(-1, -1),
        vec2(+1, -1),
        vec2(-1, +1),
        vec2(+1, +1)
    );
    const vec2 coords[4] = vec2[](
        vec2(0, 0),
        vec2(1, 0),
        vec2(0, 1),
        vec2(1, 1)
    );

    textureCoords = coords[gl_VertexID];
    gl_Position = vec4(positions[gl_VertexID], 0.0, 1.0);
}