1
votes

I'm already implemented Remote Chunking using AMQP (RabbitMQ). Now I need to run parallel jobs from within a web container.

My simple controller (testJob use remote chunking):

@Controller
public class JobController {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired
    private Job testJob;

    @RequestMapping("/job/test")
    public void test() {
        JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
        jobParametersBuilder.addDate("date",new Date());
        try {
            jobLauncher.run(personJob,jobParametersBuilder.toJobParameters());
        } catch (JobExecutionAlreadyRunningException | JobRestartException | JobParametersInvalidException | JobInstanceAlreadyCompleteException e) {
            e.printStackTrace();
        }

    }

}

testJob reads data from filesystem (master chunk) and send it to remote chunk (slave chunk). The problem is that ItemReader is not thread safe.

There are some practical limitations of using multi-threaded Steps for some common Batch use cases. Many participants in a Step (e.g. readers and writers) are stateful, and if the state is not segregated by thread, then those components are not usable in a multi-threaded Step. In particular most of the off-the-shelf readers and writers from Spring Batch are not designed for multi-threaded use. It is, however, possible to work with stateless or thread safe readers and writers, and there is a sample (parallelJob) in the Spring Batch Samples that show the use of a process indicator (see Section 6.12, “Preventing State Persistence”) to keep track of items that have been processed in a database input table.

I'm considered on parallelJob sample on spring batch github repository https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/common/StagingItemReader.java

I'm a bit confused about Process indicator pattern. Where I can find more detailed information about this pattern?

1

1 Answers

4
votes

If all you're concerned with is that the ItemReader instance would be shared across job invocations, you can declare the ItemReader as a step scope and you'll get a new instance per invocation which would remove the threading concerns.

But to answer your direct question about the process indicator pattern I'm not sure where good documentation on it by itself is. There is a sample of it's implementation in the Spring Batch Samples (the parallel job uses it).

The idea behind it is that you provide a status to the records you are going to process. At the beginning of the job/step you mark those records as in process. As the records are committed, you mark them as processed. This removes the need to track the state in the reader since your state is actually in the db (your query only looks for records marked as in process).