I'm trying to implement a simple 64 Bit double addition operation using ARM Neon. I've come across this Question but there was no sample implementation using ARM intrinsic available in the answer. So any Help in providing a complete example is greatly appreciated. Here is what i have tried so far by using integer type registers.
Side Note:
Please note that i'm using intel/ARM_NEON_2_x86_SSE library for simulating this ARM Neon code using SSE instructions. Should i switch to native ARM neon to test this code?
int main()
{
double Val1[2] = { 2.46574621,0.46546221};
double Val2[2] = { 2.63565654,0.46574621};
double Sum[2] = { 0.0,0.0 };
double Sum_C[2] = { 0.0,0.0};
vst1q_s64(Sum, //Store int64x2_t
vaddq_s64( //Add int64x2_t
vld1q_s64(&(Val1[0])), //Load int64x2_t
vld1q_s64(&(Val2[0])) )); //Load int64x2_t
for (size_t i = 0; i < 2; i++)
{
Sum_C[i] = Val1[i] + Val2[i];
if (Sum_C[i] != Sum[i])
{
cout << "[Error] Sum : " << Sum[i] << " != " << Sum_C[i] << "\n";
}
else
cout << "[Passed] Sum : " << Sum[i] << " == " << Sum_C[i] << "\n";
}
cout << "\n";
}
[Error] Sum : -1.22535e-308 != 5.1014
[Error] Sum : 1.93795e+307 != 0.931208
vaddq_s64
is signed-integer, as a quick google will tell you: infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491f/…. Obviously integer add of FP bit-patterns doesn't give you an FP add operation. Another search forfloat64x2_t
found thatvaddq_f64
exists, so presumably that's what you want. – Peter Cordes