SIMD-able code?

Question

What is the strict definition of what code can utilise SIMD instruction set? Is it anything where you can run calculations in parallel?

So if I had:

for(int i=0; i<100; i++){
    sum += array[i];
}

this could take advantage of SIMD because we could run:

for(int i=0; i<100;i=i+4){
    sum0 += array[i];
    sum1 += array[i+1];
    sum2 += array[i+2];
    sum3 += array[i+3];
}

sum = sum0 + sum1 + sum2 + sum3;

?

Does it have to be float types, or could it be double and integer?

There isn't really any useful definition. You can use it when you can use it - it's really only answerable by experience. — harold

Paul R Paul R · Accepted Answer · 2013-01-10T12:43:10

Assuming you're talking about x86 (SSE et al) then the supported types for arithmetic are 8, 16, 32 and 64 bit integers, and single and double precision floats. Note however that not all arithmetic operations are supported for all data types - SSE lacks orthogonality in this regard.

Assuming 32 bit ints and suitably aligned arrays (16 byte aligned) then you could implement your above loop example as:

#include <emmintrin.h>                     // SSE2 intrinsics

int32_t a[100] __attribute__ ((aligned(16)));
                                           // suitably aligned array

__m128i vsum = _mm_set1_epi32(0);          // init vsum = { 0, 0, 0, 0 }
for (int i = 0; i < 100; i += 4)
{
    __m128i v = _mm_load_si128(&a[i]);     // load 4 ints from a[i]..a[i+3]
    vsum = _mm_add_epi32(vsum, v);         // accumulate 4 partial sums
}
// final horizontal sum of partial sums
vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));
int32_t sum = _mm_cvtsi128_si32(vsum);     // sum = scalar sum of a[]

SIMD-able code?

1 Answers