3
votes

simd pragma can be used with icc compiler to perform a reduction operator:

#pragma simd
#pragma simd reduction(+:acc)
#pragma ivdep
for(int i( 0 ); i < N; ++i )
{
  acc += x[i];
}

Is there any equivalent solution in msvc or/and gcc?

Ref(p28): http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf

3

3 Answers

3
votes

For Visual Studio 2012: With options /O1 /O2/GL, to report vectorization use /Qvec-report:(1/2)

int s = 0; 
for ( int i = 0; i < 1000; ++i ) 
{ 
s += A[i]; // vectorizable 
}

In the case of reductions over "float" or "double" types, vectorization requires that the /fp:fast switch is thrown. This is because vectorizing the reduction operation depends upon "floating point reassociation". Reassociation is only allowed when /fp:fast is thrown

Ref(associated doc;p12) http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/10/auto-vectorizer-in-visual-studio-11-cookbook.aspx

2
votes

GCC definitely can vectorize. Suppose you have file reduc.c with contents:

int foo(int *x, int N)
  {
    int acc, i;

    for( i = 0; i < N; ++i )
      {
        acc += x[i];
      }

    return acc;
  }

Compile it (I used gcc 4.7.2) with command line:

$ gcc -O3 -S reduc.c -ftree-vectorize -msse2

Now you can see vectorized loop in assembler.

Also you may switch on verbose vectorizer output say with

$ gcc -O3 -S reduc.c -ftree-vectorize -msse2 -ftree-vectorizer-verbose=1

Now you will get console report:

Analyzing loop at reduc.c:5
Vectorizing loop at reduc.c:5
5: LOOP VECTORIZED.
reduc.c:1: note: vectorized 1 loops in function.

Look at the official docs to better understand cases where GCC can and cannot vectorize.

1
votes

gcc requires -ffast-math to enable this optimization (as mentioned in the reference given above), regardless of use of #pragma omp simd reduction. icc is becoming less reliant on pragma for this optimization (except that /fp:fast is needed in absence of pragma), but the extra ivdep and simd pragmas in the original post are undesirable. icc may do bad things when given a pragma simd which doesn't include all relevant reduction, firstprivate, and lastprivate clauses (and gcc may break with -ffast-math, particularly in combination with -march or -mavx). msvc 2012/2013 are very limited in auto-vectorization. There are no simd reductions, no vectorization within OpenMP parallel regions, no vectorization of conditionals, and no advantage is taken of __restrict in vectorizations (there is some run-time check to vectorize less efficiently but safely without __restrict).