10
votes

Lucene encourages the reuse of an IndexWriter from multiple threads.

Given that two threads might have a reference to the IndexWriter, if thread A calls close on the writer, thread B would be left with a useless writer. But to my understanding lucene somehow knows that another thread uses the same writer and defers its closure.

Is this indeed the case? How does lucene track that another thread uses the writer?

EDIT Judging from the answers it is not correct to close the IndexWriter. But this poses a new issue: If one keeps an IndexWriter open, essentially blocks access to this index from another JVM (eg in case of a cluster, or a shared index between many applications).

3

3 Answers

6
votes

If one thread closes IndexWriter while other threads are still using it, you'll get unpredictable results. We try to have the other threads hit AlreadyClosedException, but this is just best effort (not guaranteed). EG you can easily hit NullPointerException too. So you must synchronize externally to make sure you don't do this.

Recently (only in Lucene's trunk right now, to be 4.0 eventually) a big thread bottleneck inside IndexWriter was fixed, allowing segment flushes to run concurrently (previously they were single threaded). On apps running with many indexing threads on concurrent hardware this can give a big boost in indexing throughput. See http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html for details.

1
votes

The threadafety and reuse of IndexWriter means you can have multiple threads all using that instance to create/update/delete documents. If you close indexwriter in one thread though, it will indeed muck everyone else up.

0
votes

Are you referring to the waitForMerges flag on the IndexWriter.close() method?

Closes the index with or without waiting for currently running merges to finish. This is only meaningful when using a MergeScheduler that runs merges in background threads.

Lucene generally uses background threads to consolidate fragmented writes that have occurred across multiple threads - the writes themselves happen immediately, but the consolidation happens asynchronously.

When closing the writer, you should allow it to finish the consolidation process, otherwise:

it is dangerous to always call close(false), especially when IndexWriter is not open for very long, because this can result in "merge starvation" whereby long merges will never have a chance to finish. This will cause too many segments in your index over time.

So the writer doesn't "know" about your threads, in the sense that you meant.