1
votes

I have a multi step Spring Batch job and in one of steps I create Lucene indices for the data read in reader so subsequent steps can search in that Lucene index.

Based on read data in ItemReader, I spread indices to few separate directories.

If I specify, Step Task Executor to be a SimpleAsyncTaskExecutor , I don't get any issue as long as indices are always written to different directories but sometimes I get a locking exception. I guess, two threads tried to write to same Index.

If I remove SimpleAsyncTaskExecutor, I don't get any issues but write becomes sequential and slow.

Is it possible to use multi threading for a Lucene Index writer if indices are being written to a single directory?

Do I need to make index creator code to be thread safe to use SimpleAsyncTaskExecutor?

index creator code is in step processor.

1

1 Answers

1
votes

I am using Lucene 6.0.0 and as per IndexWriter API Doc,

NOTE: IndexWriter instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead.

I was creating multiple instances of writer and that was causing problems. Single writer instance can be passed to as many threads as you like provided rest of the code around that writer is thread safe.

I used a single writer instance and parallelized chunks. Each parallel chunk wrote to same directory without any issues.

To parallelize chunks, I had to made my chunk components - reader , processor and writer to be thread safe.