I am trying to understand the possible benefits of compiling C++ code with active neon flags in the gcc compiler. For that I made a little program that iterates through an array and makes simple arithmetic operations.
I changed the code so that anyone can compile and run it. If anyone would be nice enough to perform this test and share results, I'd be much appreciated :)
EDIT: I really ask t someone who happen to have a Cortex-A9 board nearby to perform this test and check if the result is the same. I'd really appreciate that.
#include <ctime>
int main()
{
unsigned long long arraySize = 30000000;
unsigned short* arrayShort = new unsigned short[arraySize];
std::clock_t begin;
for (unsigned long long n = 0; n < arraySize; n++)
{
*arrayShort = rand() % 100 + 1;
arrayShort++;
}
arrayShort -= arraySize;
begin = std::clock();
for (unsigned long long n = 0; n < arraySize; n++)
{
*arrayShort += 10;
*arrayShort /= 3;
arrayShort++;
}
std::cout << "Time: " << (std::clock() - begin) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
arrayShort -= arraySize;
delete[] arrayShort;
return 0;
}
Basically, I fill a 30000000 sized array with random numbers between 1 and 100, and then I go through all elements to sum 10 and divide by 3. I was expecting that compiling this code with active neon flags would lead to great improvements due to its capability of making multiple array operations at a time.
I am compiling this code to run in a Cortex A9 ARM board using Linaro toolchain with GCC 4.8.3. I compiled this code with and without the following flags:
-O3 -mcpu=cortex-a9 -ftree-vectorize -mfloat-abi=hard -mfpu=neon
I also replicated the code to run with an array of type unsigned int, float and double, and these are the results in seconds:
Array type unsigned short:
With NEON flags: 0.07s
Without NEON flags: 0.089s
Array type unsigned int:
With NEON flags: 0.524s
Without NEON flags: 0.529s
Array type float:
With NEON flags: 0.65s
Without NEON flags: 0.673s
Array type double:
With NEON flags: 0.955s
Without NEON flags: 0.927s
You can see that for the most part, there is almost no improvement in using the neon flags, and it even leads to worse results in the case of the array of doubles.
I really feel that I'm doing something wrong here, possibly you can help me interpreting these results.
Timer
andRNG
. – Notlikethat