Intel / ARM intrinsics equivalence

Question

I have a C application using Intel intrinsics like:

__m128 _mm_add_ps (__m128 a, __m128 b)
__m128 _mm_sub_ps (__m128 a, __m128 b)
__m128 _mm_mul_ps (__m128 a, __m128 b)
__m128 _mm_set_ps (float e3, float e2, float e1, float e0)
void _mm_store_ps (float* mem_addr, __m128 a)
__m128 _mm_load_ps (float const* mem_addr)

Now, i am trying to modify my application in order to make it work on ARMv8 using a simulator called Gem5. So, i began to look around for ARM intrinsics and i found this manual ARM® NEON™ Intrinsics Reference

Well, i found the arithmetic intrinsics, but I'm a little bit lost with setting, storing and loading instructions.

Anyone with experience with ARM intrinsics could tell me the right intrinsics?

I know that ARM and x86 are different architectures of course, But certainly, there are certain logical similarities that make us port my application from x86 to ARM — A.nechi
A porting guide and header file to convert SSE intrinsics to their ARM NEON equivalent — aebudak
A link from the "related" sidebar that's worth pointing out specifically: stackoverflow.com/questions/2851421/… — Peter Cordes
The only thing that it's troubling me is the setting because i've made a macro like so: #define SET_FLOAT32x4(dest, e3, e2, e1, e0){dest = { e3, e2, e1, e0}} .But i keep getting the error expected expression before"{" — A.nechi

Paul R Paul R · Accepted Answer · 2016-08-12T14:26:30

Here are a few equivalents to get you started:

SSE             ARM

__m128          float32x4_t     // 4 x 32 bits floats in a vector

_mm_load_ps     vld1q_f32       // load float vector from memory

_mm_store_ps    vst1q_f32       // store float vector to memory

_mm_add_ps      vaddq_f32       // add float vectors

As for initialising a vector, as you might with e.g. _mm_set_ps in SSE, compilers such as gcc and clang allow you to this in a slightly more C-like way with Neon data types, e.g.

const float32x4_t v = { 1.0f, 2.0f, 3.0f, 4.0f };

However if your compiler does not support this method then you may have have to use equivalent Neon intrinsics.

Intel / ARM intrinsics equivalence

1 Answers