0
votes

I just implemented basic opengl rendering into my pygame application thinking hardware acceleration would make the program run faster. It is much, much slower instead.

Looks like the problem is the drawing function.

Here is my opengl drawing function

    def draw(self, screen):
        rect = self.texture.imagerect.copy()
        rect.x += self.xoffset
        rect.y += self.yoffset

        halfWidth = self.getWidth()/2
        halfHeight = self.getHeight()/2

        glEnable(GL_TEXTURE_2D)

        glBindTexture(GL_TEXTURE_2D, self.texture.getTexID()) 

        self.color.setGLColor()

        glPushMatrix()

        glTranslatef(rect.x,rect.y,0)

        glRotatef(self.angle, 0, 0, 1);

        glBegin(GL_QUADS)

        glTexCoord2d(0,0)
        glVertex2f(-halfWidth + self.pivot.x, -halfHeight + self.pivot.y)

        glTexCoord2d(0,1)
        glVertex2f(-halfWidth + self.pivot.x,-halfHeight + self.getHeight() + self.pivot.y)

        glTexCoord2d(1,1) 
        glVertex2f(-halfWidth + self.getWidth() + self.pivot.x,-halfHeight + self.getHeight() + self.pivot.y)

        glTexCoord2d(1,0)
        glVertex2f(-halfWidth + self.getWidth() + self.pivot.x,-halfHeight + self.pivot.y)

        glEnd()

        glPopMatrix()

what my profiler gives for the draw function

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
312792   20.395    0.000   34.637    0.000 image.py:61(draw)

the rest of my profiler text: (expires in 1 month)

http://pastebin.com/ApfiCQzw

my sourcecode

https://bitbucket.org/claysmithr/warbots/src

Note: when i set it to not draw any tiles i get 60 fps! I also get 20 fps if i limit to only draw tiles that appear on the screen, but this is still much slower than blitting

Number of tiles i'm trying to draw (64x64): 15,625

Is there any way to test if I am really hardware accelerated?

Should I just go back to blitting?

edit: Does blitting automatically not draw tiles that are not on the screen? that could be the reason why opengl is being so slow!

1
You are doing some things a little odd if you care about performance. Above all, you are passing your texture coordinates as double-precision. 1.0 is no more precise than 1.0f, but GL has to convert 1.0 to single-precision, so that wastes CPU cycles. - Andon M. Coleman

1 Answers

1
votes

If I've understood correctly, you need to draw thousands of these textured quadrangles every frame. The slowdown comes from the multiple OpenGL calls you are making for each quadrangle - at least 16 in the code above.

What you need to do is draw in batches, many more primitives at a time.

To start with, can you merge tiles into bigger units? Any graphics card made in the last decade can handle 16K x 16K texture maps. The more quads you can draw without having to bind a new texture, the better.

Replace the glBegin .. glEnd blocks with vertex arrays. Even in the worst case of still drawing one quad at a time, you can replace the current 10 OpenGL calls with just 2, glVertexPointer and glTexCoordPointer.

Then start merging tile quads into bigger vertex arrays. Instead of having a glTranslatef, add the rect.x and rect.y values directly to each vertex.

The glRotatef is a problem if it really does have to be different for each tile. If it's limited to multiples of 90 degrees then you don't need it, instead just swap the texture coords around. For other values, work out how to use sin and cos to directly rotate the texture coords.

Once you've eliminated the translate and rotate per tile, you can stick all the calculated quadrangle vertex and texture coords into giant vertex arrays and draw the entire map with just two calls.

Hope this helps.

(For real hyper performance you'd probably want to use GPU side vertex buffer objects and shaders, but from your coding style I assume you want to stick with OpenGL 1/2.)