Spring Batch - Pass all data between reader processor and writer

Question

I'm curious how one would manage to pass all available data from the reader down through the pipeline.

e.g. I want the reader to pull all the data in and pass the entire result set down to the processor and the writer. The result set is small, I'm not worried about resources. I thought I had implemented this properly by having all of the components (reader, writer, processor) receive and return a collection of the processed item.

While the results of the process appears to be fine, what I am seeing is that the job is reading everything in, passing it down through the pipeline and then it returns to the reader, reads everything and passes it down and so on.

I've considered creating an extra step to read all the data in and pass it down to a subsequent step, but I'm curious if I can do this and how

The job looks like

@Bean
Job job() throws Exception {
    return jobs.get("job").start(step1()).build()
}
@Bean
protected Step step1() throws Exception {
    return steps.get("step1").chunk(10)
    .reader(reader()
    .processor(processor()
    .writer(writer()).build()

//....

The reader, processor and writer accept and return a List, e.g.

class DomainItemProcessor implements ItemProcessor<List<Domain>, List<Domain>>{

Are you sure you need to do it this way? Do you need to process a logical group of Domain objects at a time? If you do this, I believe you'll need to have List<Domain> as your type parameter for everything. ItemReader<List<Domain>> and ItemWriter<List<Domain>>. You would likely need a custom ItemReader and ItemWriter to handle this. — Jared Gommels
Not sure if I understand your question correctly: What you want is to construct something (a list of domain in your case) and share across multiple steps? Seems that in Spring Batch 3 you can do it by using JobScope (which you may want to have a first step that always run, and construct the job scope list). I used another way to solve when I was using Spring Batch 1 (I believe may not applicable to you, which we run every job in its own separate child context) — Adrian Shum
I have a custom reader and writer. I have to combine the results of two distinct data sources into a single domain object. I can't have two readers, that I know of, in a single step, so I'm creating two data sources in batchconfiguration and passing them both into the reader, merging and returning a collection of complete domain objects — nbpeth
@adrian I have only a single step, I want to do all of my reading one time and then pass the resulting collection to the processor, process the collection in its entirety and pass that to the writer. No batches, one swoop — nbpeth
Just wonder, given such kind of un-intuitive reader logic, why don't you just write a simple tasklet instead? Or, there is no reason why you cannot write your own reader which read from 2 data sources and construct the item with your own logic — Adrian Shum

Hansjoerg Wingeier Hansjoerg Wingeier · Accepted Answer · 2016-08-24T05:22:29

You could also implement it as a tasklet. Since you want to process all data at once, you do not really have batch-processing and therefore, the whole restart and failurehandling of a "normal" springbatch step will not be used at all.

A tasklet like this could look as follows in pseudocode:

@Component
public class MyTasklet implements Tasklet {

    @Autowired
    private ItemReader<YourType> readerSpringBeanName;

    @Autowired
    private ItemProcessor<List<YourType>,List<YourType>> processorSpringBeanName;

    @Autwired
    private ItemWriter<List<YourType>> writerSpringBeanName;


    RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
        readerSpringBeanName.open(new ExecutionContext());
        writerSpringBeanName.open(new ExecutionContext());

        List<YourType> items = new ArrayList<>();
        YourType readItem = readerSpringBeanName.read();
        while(readItem != null) {
             items.add(readItem);
             readItem = readerSpringBeanName.read();
        }

        writerSpringBeanName.write(processorSpringBeanName.process(items));

        readerSpringBeanName.close();
        writerSpringBeanName.close();
        return RepeatStatus.FINISHED;
    }
}

Moreover, depending on your usecase, there is probably not even the need to define a spring-batch job at all.

Spring Batch - Pass all data between reader processor and writer

3 Answers