2
votes

I have an IplImage from openCV, which stores its data in a row-ordered format;

image data is stored in a one dimensional array char *data; the element at position x,y is given by

elem(x,y) = data[y*width + x] // see note at end

I would like to convert this image as quickly as possible to and from a second image format that stores its data in column-ordered format; that is

elem(x,y) = data[x*height + y]

Obviously, one way to do this conversion is simply element-by-element through a double for loop.

Is there a faster way?


note for openCV afficionados, the actual location of elem(x,y) is given by data + y*widthstep + x*sizeof(element) but this gives the general idea, and for char data sizeof(element) = 1 and we can make widthstep = width, so the formula is exact

3
the double for loop would be O(N) on the number of elements and you can't beat that because you have to copy them all but within your for loops there may be some multiplications you don't need to perform each time. Highly unlikely to make any performance difference. You could of course split the process into multiple threads if the image is large.CashCow
Do you absolutely have to copy it? If you must, there is no faster way. Your copy algorithm is optimal since every element indeed need to be visited. If you don't copy it, consider just swapping the indices - that is, whenever you need to index it, index it with NOT with (i,j) but with (j,i). Can you do that? You can easily see that this needs O(1) time (and perhaps O(1) space).Juho
@mrm: Big-O notation isn't really relevant here; the speed will be dominated by cache utilisation (i.e. memory-access pattern).Oliver Charlesworth
@Oli Sorry for nitpicking, but I think it is. You are correct in that sense that the work complexity is optimal for both the "naive algorithm" and for the cache-oblivious transposition (the cache utilization has a major impact, yes!). However the O-notation is used for analyzing cache-oblivious algorithms as well. And as I suggested, the O(1) time and space solution is even faster, if one is able to use it.Juho

3 Answers

4
votes

It is called "matrix transposition" Optimal methods try to minimise the number of cache misses, swapping small tiles with the size of one or a few cache slots. For a multi-level cache this will get difficult. start reading here

this one is a bit more advanced

BTW the urls deal with "in place" transposition. Creating a transposed copy will be different (it uses twice as many cache slots, duh!)

0
votes

Assuming you need a new array that has the elements all moved, the fastest you can manage in algorithmic speed is O(N) on the number of elements (i.e. width * height).

For actual time taken, it is possible to spawn multiple threads where each one copies some of the elements. This is only worthwhile of course if you really do have a lot of them.

If the threads are already created and they accept the tasks in queues, or whatever, this would be most efficient if you are going to process lots of these images.

within your individual "loops" you can avoid doing the same multiplication multiple times, of course, and pointer arithmetic is likely to be a bit faster than random-access.

0
votes

You've kind of answered yourself but without a code. I think you need sth like:

typedef struct
{
    unsigned char r;
    unsigned char g;
    unsigned char b;
}somePixelFormat;

#define HEIGHT 2
#define WIDTH  4

// let's say this is original image width=4 height=2 expresed as one dimentional
// array of structs that adhere to your pixel format
somePixelFormat src[ WIDTH * HEIGHT ] = 
{
    {0,0,0}, {1,1,1}, {2,2,2}, {3,3,3},
    {4,4,4}, {5,5,5}, {6,6,6}, {7,7,7}
};

somePixelFormat dst[ WIDTH * HEIGHT ];

void printImage( void *img, int width, int height, int pixelByteCount )
{
    for ( int row = 0; row < height; row++ )
    {
        for ( int col = 0; col < width; col++ )
        {
            printf( "(%02d,%02d,%02d) ", ((somePixelFormat*)img + width * row + col)->r,
                                         ((somePixelFormat*)img + width * row + col)->g,
                                         ((somePixelFormat*)img + width * row + col)->b );
        }

        printf ( "\n" );
    }
    printf("\n\n");
}

void flip( void *dstImg, void *srcImg, int srcWidth, int srcHeight, int pixelByteCount )
{
    for ( int row = 0; row < srcHeight; row++ )
    {
        for ( int col = 0; col < srcWidth; col++ )
        {
            *((somePixelFormat*)dstImg + srcHeight * col + row) = *((somePixelFormat*)srcImg + srcWidth * row + col);
        }
    }
}

int main()
{
    printImage( src, 4, 2, sizeof(somePixelFormat) );
    flip( dst, src, 4, 2, sizeof(somePixelFormat) );
    printImage( dst, 2, 4, sizeof(somePixelFormat) );

    getchar();
    return 0;
}

And here's example output:

(00,00,00) (01,01,01) (02,02,02) (03,03,03) 
(04,04,04) (05,05,05) (06,06,06) (07,07,07) 


(00,00,00) (04,04,04) 
(01,01,01) (05,05,05) 
(02,02,02) (06,06,06) 
(03,03,03) (07,07,07)