1
votes

Following on with a previous project (Integer overflow when calculating the amount of memory to allocate) I have the program up and running and it is producing valid results on small data files creating small arrays (<2 GB of RAM). With larger project data files, however, the arrays are getting toward 10 GB. The read and processing of data progresses just fine.

But when it comes to writing the data back out to file, instead of writing to disk it fills the memory buffer, exhausts system memory (32 GB RAM), and then the machine locks up and restarts. This outcome is consistent between machines (laptops, desktops and VMs) and regardless of the storage device on which the process is completed (SSD, HDD, USB-HDD or network HDD).

All systems are ~12-18 months old, i7 processors, sufficient RAM and disk space, etc.

Google provided suggestions such as FLUSH, setting the environment variable GFORTRAN_UNBUFFERED_ALL to 1 (or 'y' or 'Y') and manual approaches such as closing the file and then opening it again with ACTION='append' to force the write.

Of these approaches, the close-n-open approach is the only one that clearly works, however it just results in memory filling more slowly than it otherwise would, and down goes the system again in the end.

Here is an example of the write without any interference:

      program giant_array

      use iso_fortran_env

      implicit none

      character(len=*), parameter :: csvfmt = '(*(f0.3,:,","))'

      character(20) intval
      character(200) line
      integer(kind=int32) x, y, z, i, cnt
      real(kind=real64), dimension(:,:,:,:), allocatable :: model

      print *,
      print *, "Allocating array and assigning values..."
      print *,

      call random_seed()
      allocate(model(382,390,362,28))
      call random_number(model)

      print *, "Writing array to file..."
      print *,

      open(31, file="test.csv", status='replace', action='write')

      cnt=0

      ! Write array to file:
      do x = 1, 382
        do y = 1, 390
          do z = 1, 362
            write(31, csvfmt) (model(x,y,z,i), i = 1, 28)
            cnt=cnt+1
            if((int(cnt/1000)*1000).eq.cnt) then
              line = " Processing block grade "
              write(intval,'(I12)') cnt
              line = trim(line)//" "//trim(adjustl(intval))//"..."
              write(*,'(A,A)', advance='no') achar(13), trim(line)
            endif
          enddo
        enddo
      enddo

      close(31, status='keep')

      end program

During execution you'll notice test.csv remains at size=0 until you kill the program.

Even with 'call SLEEP(1)' between the open and closing, the buffer fills more quickly than the disk write, and before the job is done the system crashes. It would also take forever to complete.

I've found reference to using fsync() to remedy this problem but can't get the code to compile (I think I'm stuffing the command line args). Code is as follows, from gcc.gnu.org:

  ! Declare the interface for POSIX fsync function
  interface
    function fsync (fd) bind(c,name="fsync")
    use iso_c_binding, only: c_int
      integer(c_int), value :: fd
      integer(c_int) :: fsync
    end function fsync
  end interface

  ! Variable declaration
  integer :: ret

  ! Opening unit 10
  open (10,file="foo")

  ! ...
  ! Perform I/O on unit 10
  ! ...

  ! Flush and sync
  flush(10)
  ret = fsync(fnum(10))

  ! Handle possible error
  if (ret /= 0) stop "Error calling FSYNC"

While others have come across this issue I can't find a solution anywhere. Comments and blog posts suggest that even the fsync() approach doesn't always do the trick.

The result is a system crash and self-restart every time.

I'm guessing there must be a way to write large files to disk in one go without excessive system specifications.

Many thanks.

Updated

Code updated as follows to test the C++ _commit statement to force from buffer to disk. Work about as well as the close-then-reopen method - still kills the machine. It's entirely possible there is still something wrong with my implementation...

      program giant_array

      use iso_fortran_env
      use iso_c_binding

      implicit none

      ! Declare the interface for WIN32 _commit function
      interface
        function commit (fd) bind(c,name="_commit")
        use iso_c_binding, only: c_int
          integer(c_int), value :: fd
          integer(c_int) :: commit
        end function commit
      end interface

      character(len=*), parameter :: csvfmt = '(*(f0.3,:,","))'

      character(20) intval
      character(200) line
      integer(kind=int32) error
      integer(kind=int32) var, x, y, z, i, cnt
      real(kind=real64), dimension(:,:,:,:), allocatable :: model

      print *,
      print *, "Allocating array and assigning values..."
      print *,

      call random_seed()
      allocate(model(382,390,362,28))
      call random_number(model)

      print *, "Writing array to file..."
      print *,

      open(31, file="test.csv", status='replace', action='write')

      cnt=0

      ! Write array to file:
      do x = 1, 382
        do y = 1, 390
          do z = 1, 362
            write(31, csvfmt) model(x,y,z,:)
            cnt=cnt+1
            if((int(cnt/1000)*1000).eq.cnt) then
              line = " Processing block grade "
              write(intval,'(I12)') cnt
              line = trim(line)//" "//trim(adjustl(intval))//"..."
              write(*,'(A,A)', advance='no') achar(13), trim(line)
              flush(31)
              error=commit(fnum(31))
            endif
          enddo
        enddo
      enddo

      close(31, status='keep')

      end program
2
How big is the array? I think we need a minimal reproducible example. - Vladimir F
Well, probably... We have to start somewhere. Will it or will it not? We don't care about the actual data but about an actual compilable and runable piece of code with the writing problem. - Vladimir F
As an aside, what's the purpose of the variable "var" in your example? Seems you're using it to write 28 copies of the array to the file, is that intentional? - janneb
Do you really need to write such a huge array one line at a time to a formatted file? It will almost certainly be a damn sight faster blasting the whole array into an unformatted file all in one go. - Ian Bush
Talking about var just to be absolutely clear, so you are saying in your real problem you need both the implied do loop and "Do var = 1, 28" that appear in your cut down version? Because as it currently is you just write the same data 28 times over due to the outer loop over var. - Ian Bush

2 Answers

1
votes

If you're using Windows 10, as indicated by the "windows-10" tag, then I suspect the reason the fsync code you're showing fails to compile is that fsync() is a POSIX function, and it's not found on Windows. I vaguely recall Windows has a function called _commit which ought do to roughly the equivalent of fsync.

0
votes

It seems this issue is related to the compiler rather than the OS, Windows 10.

In order to further test problem, I installed the personal version of FTN95, tweaked the code, and recompiled. Code as follows:

      program giant_array

      implicit none

      character(len=17), parameter :: csvfmt = '(500(f0.3,:,","))'

      character(20) intval
      character(200) line
      character(1000) outline
      integer(kind=4) x, y, z, cnt
      real(kind=2), dimension(:,:,:,:), allocatable :: model

      write(*,*)
      write(*,*) "Allocating array and assigning values..."
      write(*,*)

      call random_seed()
      allocate(model(28,382,390,362))
      call random_number(model)

      write(*,*) "Writing array to file..."
      write(*,*)

      open(31, file="test.csv", status='replace', action='write')

      ! Write array to file:
      cnt=0
      do x = 1, 382
        do y = 1, 390
          do z = 1, 362
            write(outline, fmt=csvfmt) model(:,x,y,z)
            write(31, '(a)') trim(outline)
            cnt=cnt+1
            if((int(cnt/1000)*1000).eq.cnt) then
              line = " Processing record "
              write(intval,'(I12)') cnt
              line = trim(line)//" "//trim(adjustl(intval))//"..."
              write(*,'(A,A)', advance='no') achar(13), trim(line)
            endif
          enddo
        enddo
      enddo

      close(31, status='keep')

      end program

Compiled with FTN95 the program has no adverse effect on the system and the file will write to disk without issues and is also considerably faster than using gfortran (gcc version 8.1.0). While this answer does not solve the problem, it produces a working outcome.

I will continue to investigate gfortran for a generic solution.