Implementation of TaskExecutor in Spring Batch for parallel processing

Question

Consider a Step bean:

@Bean
  public Step stepForChunkProcessing() {
    return stepBuilderFactory
        .get("stepForChunkProcessing")
        .<Entity1, Entity2>chunk(1000)
        .reader(reader())
        .processor(processor())
        .writer(writer())
        .taskExecutor(taskExecutor())
        .throttleLimit(10)
        .build();
  }
//@formatter:on

  @Bean
  public TaskExecutor taskExecutor(){
      return new SimpleAsyncTaskExecutor("MyApplication");
  }

Requirement: In Reader, it reads from records (of Entity1) from a File. In Processor, it processes and in Writer, it writes into the database.

Before TaskExecutor, Only one thread was created and it would loop around in Reader and Processor for 1000 times as defined in chunk setting above. Then it would move to writer and writes all the 1000 records. Again it would start from record number 1001 and then process another 1000 records in Reader and Processor. This is an synchronize execution.

After TaskExecutor and the throttle limit as 10, 10 threads were created independent to each other. How will they maintain the number of records from the file that are already processed by other threads? Also consider if I give synchronized keyword in the Read method of the reader, still how come the different threads will keep a check on already processed records from the file?

I added an answer, please accept it if it helped: stackoverflow.com/help/someone-answers. — Mahmoud Ben Hassine

Mahmoud Ben Hassine Mahmoud Ben Hassine · Accepted Answer · 2020-06-10T08:47:41

That's impossible in a multi-threaded environment, as mentioned in the Multi-threaded section of the reference documentation:

 Many participants in a Step (such as readers and writers) are stateful.
 If the state is not segregated by thread, then those components are not
 usable in a multi-threaded Step

That's why the documentation mentions to turn off state management on the javadoc of AbstractItemCountingItemStreamItemReader#setSaveState, here is an excerpt:

Always set it to false if the reader is being used in a concurrent environment.

Implementation of TaskExecutor in Spring Batch for parallel processing

1 Answers