0
votes

I wrote this code to find the highest temperature pixel in a thermal image. I also need to know the coordinates of the pixel in the image.

void _findMax(uint16_t* image, int sz, sPixelData* returnPixel)
{
    int temp = 0;
    uint16_t max = image[0];

    for(int i = 1; i < sz; i++)
    {
        if(max < image[i])
        {
            max=image[i];
            //temp = i;
        }
    }

    returnPixel->temperature = image[temp];

    //returnPixel->x_location = temp % IMAGE_HORIZONTAL_SIZE;
    //returnPixel->y_location = temp / IMAGE_HORIZONTAL_SIZE;
}

With the three lines commented out the function executes in about 2ms. With the lines uncommented it takes about 35ms to execute the function.

This seems very excessive seeing as the divide and modulus are only performed once after the loop.

Any suggestions on how to speed this up?

Or why it takes so long to execute compared to the divide on modulus not include?

This is executing on an ARM A9 processor running Linux.

The compiler I'm using is ARM v8 32-Bit Linux gcc compiler.

I'm using optimize -O3 and the following compile options: -march=armv7-a+neon -mcpu=cortex-a9 -mfpu=neon-fp16 -ftree-vectorize.

1
If you don't update temp, it's always 0 and the function only executes returnPixel->temperature = image[0]. The compiler correctly identifies that the loop is not needed and removes it.Codo
Yes I am aware of that, my question is, why does the function take so much longer to execute when in theory, just one extra divide and modulus after the loop has executed. The loop only has the extra temp = i;, so that should only cause it to take about twice as long surely?James Swift
No, these are two completely different programs: one with a loop (see godbolt.org/z/555qsM) and the other one consisting of three assembler instructions only (see godbolt.org/z/bGcYb1)Codo
Why don't you use neon?Jake 'Alquimista' LEE
Not an answer, using the register modifier will shorten runtime if that is your real concern.ryyker

1 Answers

1
votes

Your code is flawed.
Since temp is simply 0, the complier will generate machine codes that just executes returnPixel->temperature = image[0]; which gets finished in no time. There is nothing odd here.

You should modify the line to: returnPixel->temperature = max;

You could boost the performance significantly by utilizing neon. But that's another problem.