I am trying to logically divide a particular requirement for a spring batch job.
We have a Process table which will give me the records that are ready to be processed.
Each record in the process table can be uniquely identified by process_id.
For each process_id I will have to generate one Excel report that may have millions of rows i.e we will have multiple excel files as the output of the job.
My plan is to
- Use Multi threaded step so that each threads reads one record - generates multiple records in processor- writes the generated records into a separate excel file.
- Read from the process table using a synchronized reader.
- In the processor use the record returned in the reader to query the DB(involving multiple joins) and form a composite object.
- Write the composite object into a file in a custom writer
The above approach doesn't sound good to me in terms of memory management.
Since the records to be written is being generated in the processor(and not got from reader, the reader just give record ids), I will be able to commit, only when all the processing is done.
And if it are multiple threads, then we will have too many of these large objects in memory before they can be written.
The challenge that I see is I read one record but write multiple records. So the commit interval=1 means for every record read and processed, there will be one write.
Since the processor generates millions of records, I can only commit after the whole processing is done. The basic goal here is to generate multiple files parallely.
Is there any better way of designing this batch job? Any help will be much appreciated.