I am in the process of implementing a spring batch job for our file upload process. My requirement is to read a flat file, apply business logic then store it in DB then post a Kafka message.
I have a single chunk-based step that uses a custom reader, processor, writer. The process works fine but takes a lot of time to process a big file.
It takes 15 mins to process a file having 60K records. I need to reduce it to less than 5 mins, as we will be consuming much bigger files than this.
As per https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html I understand making it multithreaded would give a performance boost, at the cost of restart ability. However, I am using FlatFileItemReader, ItemProcessor, ItemWriter and none of them is thread-safe.
Any suggestions as to how to improve performance here?
Here is the writer code:-
public void write(List<? extends Message> items) {
items.forEach(this::process);
}
private void process(Message message) {
if (message == null)
return;
try {
//message is a DTO that have info about success or failure.
if (success) {
//post kafka message using spring cloud stream
//insert record in DB using spring jpaRepository
} else {
//insert record in DB using spring jpaRepository
}
} catch (Exception e) {
//throw exception
}
}
Best regards, Preeti
saveAll(items)
to save all items at once in a single bulk operation. We introduced similar improvements in 4.3: docs.spring.io/spring-batch/docs/4.3.x/reference/html/… which you can use for inspiration. – Mahmoud Ben Hassine