1
votes

I have about six files, each 250 KB to 500 KB. Each of these files have multiple QImages in them; each file about 400 images of 128x64. Loading is about 60MB/s to memory (seeing as OpenGL needs to unpack PNG's to it's own format).

Is it possible to speed this process up? It's painstakingly slow as I have about a gig to fill.

QFile file("file.ucv");

if (file.open(QIODevice::ReadOnly)) {
    qDebug() << "Read from hdd";

    QDataStream r(&file);
    r.setVersion(QDataStream::Qt_4_3);

    QImage t;

    int i = maxPics * place;
    glGenTextures(maxPics, &texture[i]);
    for (int y = 0; y < yNrPics; y++)
        for (int x = 0; x < xNrPics; x++, i++) {

            // Write to precomputed object
            r >> t;

            glBindTexture( GL_TEXTURE_2D, texture[i] );
            glTexImage2D( GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA, t.width(), t.height(), 0, GL_RGBA, GL_UNSIGNED_BYTE, t.bits() );
            glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST );
            glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST );
        }

The profiler finds this line to be the most consumptious:

            glTexImage2D( GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA, t.width(), t.height(), 0, GL_RGBA, GL_UNSIGNED_BYTE, t.bits() );

Changing compressed to noncompressed, saves some time, but still not much.

The loaded QImage is in GLformat.

2
A good start is to profile the code to see what piece is the bottleneck. - Dr. Snoopy
Most likely it is a busy wait by the CPU while uploading to GPU. - RobotRock
Not really, the images are small, upload and setup overhead could be the bottleneck. - Dr. Snoopy

2 Answers

2
votes

Make sure you're sure about your data:

  • I.e. be sure the profiler shows wall-clock, not just CPU consumption, (as disk i/o does not consume CPU, but takes time).

Also, avoid doing work is the best optimization:

  • If you want to use compressed textures, compress them only once (first upload) then download the compressed data and cache it to disk. Subsequent uploads should send the compressed data directly from disk to OpenGL, without recompressing from png-via-raw-to-gl.

Pipelining work, esp with I/O is a good way to speed things up, and is usually easy (producer-consumer parallellism):

  • Pipeline the uploading by using separate threads for loading images and creating the textures. Qt offers good support here through the QtConcurrent library which means you don't have to muck about with threads yourself. (You will need to use QGLWidget::makeCurrent though)

Use all available cores on your CPU, when you can:

  • Consider using several threads to upload data to OpenGL. With Pixel Buffer Objects, you can acquire a pointer to the texture data buffer and then upload it using memcpy. (The memcpy-threads do not need to have the context active.).
  • The texture compression is done in software in the GL driver, so parallellizing the upload across cores will help here too, but you may have to create several GL context (that share data) in order to get the GL driver to compress in parallell. It shouldn't be too hard to see if it's parallell or not before you try this.

Also, avoid streaming data through buffered API:s when you don't need to:

  • By memory mapping the files the data upload can be done very efficiently (no buffering, straight from disk to GL), just a memcpy from the mapped buffer to the texture pointer.
1
votes

If you are sure the bottleneck is in that piece of code, there are several things you might do:

  • do not use GL_COMPRESSED_RGBA, but simply GL_RGBA
  • find driver's optimal type, and use that (you need to profile opengl programs)
  • use raw data in png files (avoid compression)
  • split files into smaller files