1
votes

I have a spring Batch job that basically reads from a file, processes each line and writes to an output (another file). Since the processing step is costly I want to have it run in multiple threads, but since the reading and writing steps are using files, those steps must be run on a single thread. I ended up having 3 flows, each running in parallel, with one step each, synchronized on 2 BlockingQueues. The read step reads from the file and writes to one queue. The processing step is multi-threaded and reads from the queue, processes and writes to another queue. The write step, reads from the second queue and writes the output to another file.

It works pretty well, except that i can't find a clean and 'fast' way to stop the job when everything is done. Right now i'm using 'poll' with a timeout on both queues, and assuming that if no item is present for some amount of seconds, then we're done. This delays the job termination by the specified amount of seconds, and i can't use a very small amount of time because by some external force (like machine load) the job can suffer delays.

I tried using something like a Poison Pill, but the problem is that if i override the 'doRead' method on the FlatFileItemReader to return a Poison Pill when it gets a 'null' (signifying end-of-file) then this reader will never end and the job never terminates.

Does anyone have a suggestion? From the documentation i know that i could probably just put a "synchronized" on the reader from the read Step (file) and on the writer in the Write Step (file), but i'd really prefer a different solution.

2

2 Answers

2
votes

You can just add a stateful variable in your reader to track the end of the job.

public PoisoningReader<T> extends FlatFileItemReader<T> {
    private boolean endJob = false;

    @Override
    public T doRead() {
        if (endJob) {
            return null;
        }

        T object = super.doRead();
        if (object == null) {
            endJob = true;
            return new PoisonPill();
        }
        return item;
    }
0
votes

So, i'm going to post my solution in case anyone is interested or faces a similar problem.

I, summary i ended up using a Poison Pill as Dean Clark suggested. I eventually simplified the job to only use one BlockingQueue, but i still had the problem of how to inject the Poison Pill, since it's a queue shared between steps, not inside one single step..

Basically, instead of mucking around the Readers to return the Poison Pill, and the processors to detect it and ignore it, i just let Spring Batch run normally and i just added a listener to the Step responsible for injecting the Poison Pill. This listener overrides "afterStep" and just adds it to the Queue. The Step reading from the Queue will get the Poison Pill and the end of the Queue, signifying "no more work to do" and will terminate normally, by returning null.

Another 'quirk' to this is that in one Job, the Step that reads from the Queue is configured with a ThreadPool to process items in parallel so i need to kill/unblock all the threads reading from the Queue. A nice trick was having the Reader read from the Queue and if it's a Poison Pill, just re-inject it into the Queue and return null. This way every thread get's a Poison Pill and terminates properly.