Merge Multiple csv files into Single csv using Spring batch

Question

I have a business case of Merge Multiple csv files(around 1000+ Each containing 1000 records )into Single csv using Spring batch .

Please help me provide your guidance and solutions in terms of approach and performance-wise as well.

So far, I have tried two approaches,

Approach 1.

Tasklet chunk with multiResourceItemReader to read the files from directory and FlatFileItemWriter as item writer.

Issue here is, it is very slow in processing since this is single threaded, but approach works as expected.

Approach 2: Using MultiResourcePartitioner partitioner and AsynTaskExceutor as task-executor.

Issue here is, since it is async multi-thread, data is getting overwritten/ corrupted while merging into final single file.

You need to show as what you have tried so far or according to you what you think can be a better approach as per your knowledge of Spring Batch framework? This will help to get better answers. — Sabir Khan
Are you doing any processing on source csv records ( like filtering etc ) or is it a simpl file merge with all headers being common ? — Sabir Khan
@SabirKhan - No filtering, it is simple files merge into one file with all common headers — Sada Shiv Dash
Since there is no filtering/processing and all files have the same structure, then Approach 1 should be ok (even if single threaded). What do you mean by slow, can you give some numbers? Have you tried different values for the commit-interval? That said, do you really need Spring Batch for such a simple task? Something like cat *.csv >> all.csv or equivalent should do the trick (and should be faster). — Mahmoud Ben Hassine

tausif tausif · Accepted Answer · 2020-01-07T13:54:01

You can wrap your FlatFileItemWriter in AsyncItemWriter and use along with AsyncItemProcessor. This will not corrupt your data and increase the performance as processing and writing will be through several threads.

@Bean
    public AsyncItemWriter asyncItemWriter() throws Exception {
        AsyncItemWriter<Customer> asyncItemWriter = new AsyncItemWriter<>();

        asyncItemWriter.setDelegate(flatFileItemWriter);
        asyncItemWriter.afterPropertiesSet();

        return asyncItemWriter;
    }

@Bean
    public AsyncItemProcessor asyncItemProcessor() throws Exception {
        AsyncItemProcessor<Customer, Customer> asyncItemProcessor = new AsyncItemProcessor();

        asyncItemProcessor.setDelegate(itemProcessor());
        asyncItemProcessor.setTaskExecutor(threadPoolTaskExecutor());
        asyncItemProcessor.afterPropertiesSet();

        return asyncItemProcessor;
    }

@Bean
    public TaskExecutor threadPoolTaskExecutor() {

        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(10);
        executor.setThreadNamePrefix("default_task_executor_thread");
        executor.initialize();
        return executor;

    }

Merge Multiple csv files into Single csv using Spring batch

2 Answers