0
votes

I want to execute a Fortran loop in a vectorial way with a vector processor (Intel Xeon). I recently got the way doing this with the Intel compiler ifort that we can add !DIR$ SIMD before the loop.

But when I work with gfortran compiler, I find that all the vectorization operations are automatic. For example,

      PROGRAM MAIN1
      IMPLICIT NONE

      DOUBLE PRECISION :: X(100)
      INTEGER          :: NELEM = 100, NELMAX = 100, LV = 4
      INTEGER          :: IKLE(100), I, IB, IELEM
      DOUBLE PRECISION :: W(100)
      DOUBLE PRECISION :: MASKEL(100)
      LOGICAL          :: MSK = .FALSE.

      DO I = 1, 100
        X(I) = I
        IKLE(I) = I
        W(I) = 0
      END DO

      DO IB = 1,(NELEM+LV-1)/LV
  !------------loop to vectorize------------------
      DO IELEM = 1+(IB-1)*LV , MIN(NELEM,IB*LV)
        X(IKLE(IELEM)) = X(IKLE(IELEM)) + W(IELEM)
      ENDDO ! IELEM 
  !-----------------------------------------------
      ENDDO ! IB

      PRINT *, X
      END PROGRAM

Part of the output of gfortran main1.f -O3 -fopt-info-optimized is printed below

main1.f:18:0: note: not vectorized: not suitable for gather load _33 = x[_32];
main1.f:18:0: note: bad data references.
main1.f:18:0: note: not vectorized: not enough data-refs in basic block.
main1.f:18:0: note: not vectorized: not enough data-refs in basic block.

Since the program output X is right when the loop is compiled by ifort in a mandated vectorization mode, I wonder if there's also a similar way for gfortran.

1
Can you get rid of the IKLE(IELEM)? There is no such gfortran directive I am aware of. You may want the SIMD directive from OpenMP 4.0. I wouldn't normally refer to Intel processors as "vectorial". - Vladimir F
Yeah I also did that and found it works. But unfortunately I can't get rid of it, it's part of the calculation. - Shiyu
Are you sure you got any speedup with ifort? If you mandate it it will vectorize even loops which are not profitable to be vectorized. - Vladimir F
Maybe not in this demo. But it is likely to be useful in another one which is much larger - Shiyu
It is not clear that vectorization will accelerate this loop : the arithmetic intensity of the code is low so it seems to be memory bound. Also, it may be profitable to make a random read on W instead of doing a random write on X if you can. - Anthony Scemama

1 Answers

0
votes

In this case with scatter stores, forcing vectorization by directive could change the results when there are repeated entries in the index array IKLE(:), as it doesn't preserve the sequence of memory access. As far as I know, the only directive of this nature available in gfortran is !$omp simd, which gfortran is free to ignore. omp simd directives are active only when corresponding compile options are set. ifort offers (-opt-report4 in recent versions) an assessment of peak speedup possible by vectorization. I don't know whether that assessment is based on the declared array sizes. If there is a speedup, it would be achieved more by changing the operation sequence than by actual SIMD parallelism.