3
votes

I am trying to parallelize a fortran loop via OpenMP. The loop in essence consists of only two commands:

do i=1,LSample
  calcSslice(Vpot(:,:,i), Sslice)
  rpold = rp
  combine_rp_matrices (rpold, Sslice, rp)
end do

the calcSslice subroutine reads Vpot(:,:,i), performs some calculations and stores the results in the matrix Sslice. the combine_rp_matrices uses rpold and Sslice to update rp. rp acts as a running variable and is the desired output of the program. The order in which the Sslice-matrices from different iterations are combined with rp is irrelevant. My first attempt at parallelizing this loop looked like this:

!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(Sslice), SCHEDULE(DYNAMIC)
do i=1,LSample
  calcSslice(Vpot(:,:,i), Sslice)
!$OMP CRITICAL
  rpold = rp
  combine_rp_matrices (rpold, Sslice, rp)
!$OMP END CRITICAL
end do
!$OMP END PARALLEL DO

This compiles and runs, but produces wrong results. Using the following code I get correct results but much slower execution (albeit still faster than serialized code):

!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(Sslice), SCHEDULE(DYNAMIC)
do i=1,LSample
!$OMP CRITICAL(Crit2)
  calcSslice(Vpot(:,:,i), Sslice)
!$OMP END CRITICAL(Crit2)
!$OMP CRITICAL
  rpold = rp
  combine_rp_matrices (rpold, Sslice, rp)
!$OMP END CRITICAL
end do
!$OMP END PARALLEL DO

So there is apparently some synchronization issue with calcSslice. However, I don't quite understand where this would occur. Vpot is only read from and not written to in calcSslice and Sslice is a threadprivate variable. Any global variables used in calcSslice are also only read from. The variables rpold and rp are declared in the scope of the subroutine which the DO loop is part of, and so cannot be accessed by calcSslice. The variables declared in calcSslice use the following attributes: intent(in), intent(out), target, pointer.

Where does this go wrong?

EDIT: The problem is solved, cause was the initialization of variables in calcSslice during declaration, which implies the save attribute.

1
Check the reentrance of your functions thay might be unsafe.Noureddine

1 Answers

2
votes

My guess would be that calcSslice is not threadsafe. Make sure this subroutine does not access global variables other than read-only and do not use the save attribute (beware the implicit save if you initialize variables during declaration!). You could use a threadchecker like the one provided by Intel to find race-conditions in your code. If you have no access to such software, I would start with a dummy procedure and then populate the routine incrementaly to see where it fails.

Another thing that puzzles me is the last two lines of the loop body. Every thread backups the whole matrix and then adds his slice. Wouldn't it be better to collect all slices (e.g. by a reduction clause) and then combine that large slice once?