0
votes

Why the following code in Fortran only works if I put the loop variables 'i' and 'j' as input arguments of the subroutine 'mat_init'? The loop variables 'i' and 'j' are declared as private, so shouldn't they remain private inside the subroutine when I call it?

program main
   use omp_lib
   implicit none
   real(8), dimension(:,:), allocatable:: A
   integer:: i, j, n

   n = 20
   allocate(A(n,n)); A(:,:) = 0.0d+00

   !$omp parallel do private(i, j)
   do i=1,n
   do j=1,n
   call mat_init
   end do
   end do

   do i=1,n
   write(*,'(20f7.4)') (A(i,j), j=1,n)
   end do

contains
   subroutine mat_init

      A(i,j) = 1.0d+00
   end subroutine
end program main

I know this have something to do with the 'lexical' and 'dynamic' extend, but I don't understand why OpenMP is implemented in this way to don't recognize private variables in the 'dymanic' extend inside de parallel regions. For me it seems not to be logical or am I doing anything wrong?

1
j isn't global because you declared it private, i is private because it is the loop counter of the outermost loopPetrH

1 Answers

0
votes

First, I think that the subroutine mat_init should takes the value of i and j as input arguments explicitly. Then, the value of i and j must be private, because each thread works on a specific value of i and j. I think also that openmp recognizes that i is private because the parallelized loop is on i. Idem for j. However, this work for the global variables i and j and not for those ones who are internal to the subroutine. Thus, you have to specify that i and j are private in order to force the subroutine internal variables to inhiritate of this aspect.

I believe that the problem is due to the reentrance of the subroutine mat_init. Indeed, what happen when multiple threads enter the subroutine at the same time with different value of i and j? If you don't do any special thing, the called subroutine might not recognize the private aspect of i and j.

In general, it is not welcomed to call many times a subroutine inside a loop, because each call requires a given time. I suggest to write a subroutine that is parallelized rather than call a subroutine within a parallelized section.