I am using multi-threaded step in my Batch Job to process records from source DB and write to the destination database. The Step is chunk based and consists of JdbcpagingItemReader, Processor & JdbcBathItemWriter. I understand that if any exception occurs during Step processing, the database transaction gets rolled back for the whole chunk. I would like to understand how this is managed in Spring batch internally? Since this is multi-threaded step there is no guarantee that the Processor & Writer is executed in the same Thread for the chunk. The chunk may get processed by different Threads. So how does Spring batch ensure that the database transactions are getting rolled back correctly even though different threads are acting on the same chunk?
2 Answers
Your statement is not correct : "The chunk may get processed by different Threads."
Referring to Spring batch documentation, regarding Multi-threaded Step, the Step executes by reading, processing, and writing each chunk of items in a separate thread of execution. So multithreading is enabled at the Step level not at the chunck level, and it executes every chunk in its own thread; thus each thread will be "running" a read-process-write combo.
The result of the above configuration is that the Step executes by reading, processing, and writing each chunk of items (each commit interval) in a separate thread of execution. Note that this means there is no fixed order for the items to be processed, and a chunk might contain items that are non-consecutive compared to the single-threaded case. In addition to any limits placed by the task executor (such as whether it is backed by a thread pool), there is a throttle limit in the tasklet configuration which defaults to 4. You may need to increase this to ensure that a thread pool is fully utilized.
So since each chunk runs on a dedicated thread the transaction management is is straightforward.
Please use local partition this will give you full control on execution and will give you clear understanding of each worker thread reader+processor+writer transaction,rollback batch commit and exception handling(you can add listeners)
https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#partitioning
We have used in several projects with million of records and performance of processing is ultimate and full control on worker threads. Its really a wonderful framework and most of the batch issues are handled internally and we don't have to worry. Let us know, do you required any samples.