4
votes

I am trying to vectorize a nested loop using OpenMP 4.0's simd feature, but I'm afraid I'm doing it wrong. My loops looks like this:

do iy = iyfirst, iylast
    do ix = ixfirst, ixlast

        !$omp simd
        do iz = izfirst, izlast

            dudx(iz,ix,iy) = ax(1)*( u(iz,ix,iy) - u(iz,ix-1,iy) )
            do ishift = 2, ophalf
                dudx(iz,ix,iy) = dudx(iz,ix,iy) + ax(ishift)*( u(iz,ix+ishift-1,iy) - u(iz,ix-ishift,iy) )
            enddo

            dudx(iz,ix,iy) = dudx(iz,ix,iy)*buoy_x(iz,ix,iy)

        enddo
        !$omp end simd

    enddo
enddo

Note that ophalf is a small integer, usually 2 or 4, so it makes sense to vectorize the iz loop and not the inner-most loop.

My question is: Do I have to mark ishift as a private variable?

In standard OpenMP parallel do loops, you certainly do need a private(ishift) to ensure other threads don't stomp over each other's data. Yet when I instead rewrite the first line as !$omp simd private(ishift), I get the ifort compilation error:

error #8592: Within a SIMD region, a DO-loop control-variable must not be specified in a PRIVATE SIMD clause. [ISHIFT]

Looking online, I couldn't find any successful resolution of this question. It seems to me that ishift should be private, but the compiler is not allowing it. Is an inner-loop variable automatically forced to be private?

Follow-up question: Later, when I add an omp parallel do around the iy loop, should I include a private(ishift) clause in the omp parallel do directive, the omp simd directive, or both?

Thanks for any clarifications.

1
Unroll that loop in specializations for the common cases. - Jeff Hammond
Simple omp simd constructs are not multi-threaded, they are vectorised which is different. You keep the body of the loop, but you replace the scalar instructions with vector ones. If you try to write by hand this vectorised version yourself, you'll see immediately why making ishift private makes little sense. - Gilles
Thanks @Gilles. I already knew what you said, but forcing myself to try and write it out really made me understand it better and make your point quite obvious. You're right - the ishift variable should not be made private. Furthermore, I couldn't think up a situation where the loop iterator should be made private, so the ifort error seems reasonable to me after all. Cheers. - NoseKnowsAll
I see I got the question wrong at first. I thought you have omp do private(ishift) around the iy loop. In that case the private should not be a problem. - Vladimir F

1 Answers

0
votes

Private clause when it comes to SIMD essentially means that the value of ishift is private to each SIMD lane within the SIMD register. This is true when we vectorize the innermost loop since ishift is the loop induction variable. But when you do a outer loop vectorization, every SIMD lane will have a different value for iz loop index, but given a loop index iz, ishift can still have values ranging from 2 to ophalf. So it doesn't qualify for private clause in SIMD context.

When it comes to multiple threads, you want copies of ishift so one thread incrementing this variable doesn't enable other thread skip that iteration. So private clause makes sense for ishift in omp parallel do context. It will be interesting to check the underlying code generation if the inner loop is completely unrolled and vectorized for the loop with loop index iz.