I am in the middle of instrumenting a fairly large-sized code with OpenACC. Right now, I am delaing with a routine foo that calls a few other routines bar, far, and boo, like so:
subroutine foo
real x(100,25),y(100,25),z(100,25)
real barout(25), farout(25), booout(25)
do i=1,25
call bar(barout, x(1,i),y(1,i),z(1,i))
call far(farout, x(1,i),y(1,i),z(1,i))
call boo(booout, x(1,i),y(1,i),z(1,i))
enddo
....
end subroutine foo
Couple of points: 1) x, y, and z stay constant through the loop. 2) You might not like the structure of the code here, but that is beyond my job description. I am supposed to instrument with OpenACC, period.
I am currently concentrating on the call to "bar". I want to make bar a vector routine. I am not ready to do the same for far and boo. So I would like to call bar from within a parallel region, but I am not ready to do the same with far and boo. (I said this is a work in progress, right?) Now, I could -- I think! -- sandwich bar in its own parallel region and copy data to and from it in each loop iteration
!$acc data copy(barout) &
!$acc& copyin(x(:,:),y(:,:),z(:,:))
!$acc parallel
call bar( .... )
!$acc en parallel
!$acc end data
But that's alot of data transfer. It would be great if I could transfer x,y, and z to the device just once. Each of the routines has their own data regions, so as I understand it (Please correct me if I am wrong!) I cannot encase the entire loop in a single data region. Here was an alternative I tried
subroutine foo
!$acc routine(bar) vector
real x(100,25),y(100,25),z(100,25)
real barout(25), farout(25), booout(25)
!$acc data create(x(:,:),y(:,:),z(:,:))
!$acc end data
do i=1,25
!$acc data copy(barout(:)) &
!$acc& present(x(:,:),y(:,:),z(:,:))
!$acc parallel
call bar(barout, x(1,i),y(1,i),z(1,i))
!$acc end parallel
!$acc end data
call far(farout, x(1,i),y(1,i),z(1,i))
call boo(booout, x(1,i),y(1,i),z(1,i))
enddo
....
end subroutine foo
But this doesn't work because the data in the copyin doesn't persist on the device. It is gone when the data present clause appears. (I've tried data create as well as data copyin.)
So is there a way to do what I am trying to do here? Thanks.