I am trying to modify a large PostScript file in Scala (some are as large as 1GB in size). The file is a group of batches, with each batch containing a code that represents the batch number, number of pages, etc.
I need to:
- Search the file for the batch codes (which always start with the same line in the file)
- Count the number of pages until the next batch code
- Modify the batch code to include how many pages are in each batch.
- Save the new file in a different location.
My current solution uses two iterators (iterA
and iterB
), created from Source.fromFile("file.ps").getLines
. The first iterator (iterA
) traverses in a while loop to the beginning of a batch code (with iterB.next
being called each time as well). iterB
then continues searching until the next batch code (or the end of the file), counting the number of pages it passes as it goes. Then, it updates the batch code at iterA
's position, an the process repeats.
This seems very non-Scala-like and I still haven't designed a good way to save these changes into a new file.
What is a good approach to this problem? Should I ditch iterators entirely? I'd preferably like to do it without having to have the entire input or output into memory at once.
Thanks!