I am trying to parallelize a fortran loop via OpenMP. The loop in essence consists of only two commands:
do i=1,LSample
calcSslice(Vpot(:,:,i), Sslice)
rpold = rp
combine_rp_matrices (rpold, Sslice, rp)
end do
the calcSslice subroutine reads Vpot(:,:,i), performs some calculations and stores the results in the matrix Sslice. the combine_rp_matrices uses rpold and Sslice to update rp. rp acts as a running variable and is the desired output of the program. The order in which the Sslice-matrices from different iterations are combined with rp is irrelevant. My first attempt at parallelizing this loop looked like this:
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(Sslice), SCHEDULE(DYNAMIC)
do i=1,LSample
calcSslice(Vpot(:,:,i), Sslice)
!$OMP CRITICAL
rpold = rp
combine_rp_matrices (rpold, Sslice, rp)
!$OMP END CRITICAL
end do
!$OMP END PARALLEL DO
This compiles and runs, but produces wrong results. Using the following code I get correct results but much slower execution (albeit still faster than serialized code):
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(Sslice), SCHEDULE(DYNAMIC)
do i=1,LSample
!$OMP CRITICAL(Crit2)
calcSslice(Vpot(:,:,i), Sslice)
!$OMP END CRITICAL(Crit2)
!$OMP CRITICAL
rpold = rp
combine_rp_matrices (rpold, Sslice, rp)
!$OMP END CRITICAL
end do
!$OMP END PARALLEL DO
So there is apparently some synchronization issue with calcSslice. However, I don't quite understand where this would occur. Vpot is only read from and not written to in calcSslice and Sslice is a threadprivate variable. Any global variables used in calcSslice are also only read from. The variables rpold and rp are declared in the scope of the subroutine which the DO loop is part of, and so cannot be accessed by calcSslice. The variables declared in calcSslice use the following attributes: intent(in), intent(out), target, pointer.
Where does this go wrong?
EDIT: The problem is solved, cause was the initialization of variables in calcSslice
during declaration, which implies the save
attribute.