I wrote this code to find the highest temperature pixel in a thermal image. I also need to know the coordinates of the pixel in the image.
void _findMax(uint16_t* image, int sz, sPixelData* returnPixel)
{
int temp = 0;
uint16_t max = image[0];
for(int i = 1; i < sz; i++)
{
if(max < image[i])
{
max=image[i];
//temp = i;
}
}
returnPixel->temperature = image[temp];
//returnPixel->x_location = temp % IMAGE_HORIZONTAL_SIZE;
//returnPixel->y_location = temp / IMAGE_HORIZONTAL_SIZE;
}
With the three lines commented out the function executes in about 2ms. With the lines uncommented it takes about 35ms to execute the function.
This seems very excessive seeing as the divide and modulus are only performed once after the loop.
Any suggestions on how to speed this up?
Or why it takes so long to execute compared to the divide on modulus not include?
This is executing on an ARM A9 processor running Linux.
The compiler I'm using is ARM v8 32-Bit Linux gcc compiler.
I'm using optimize -O3 and the following compile options: -march=armv7-a+neon -mcpu=cortex-a9 -mfpu=neon-fp16 -ftree-vectorize.
temp
, it's always 0 and the function only executesreturnPixel->temperature = image[0]
. The compiler correctly identifies that the loop is not needed and removes it. – Codotemp = i;
, so that should only cause it to take about twice as long surely? – James Swiftregister
modifier will shorten runtime if that is your real concern. – ryyker