1
votes

I have an Azure WebJob with a Queue that receives items to process. There can be many items to process every second. The Queue process around 20 items simultaneously.

I want to index the items with Lucene .net.

Starting an IndexWriter, calling Optimize() and Disposing it on every item that hits the queue takes too much time. It feels that I am doing it wrong.

I want the items to be ready for search as soon as possible.

Is it ok to have one IndexWriter for many threads?

Do i need to call Optimize() or is it Ok to never call it, or call it on a separate process that runs once a day (for example)?

If i have only one IndexWriter and never Dispose it (except when the program exits), would i have new items stuck on the buffer?

Would new items added with the IndexWriter be available for search before disposing the IndexWriter?

Thank you.

1

1 Answers

4
votes
  1. The IndexWriter is thread-safe, it's safe to call from different threads.
  2. It's okay to never call optimize. (You could write a custom merge policy if the default doesn't work for you.)
  3. You will flush all documents to disk by calling commit. There's no need to dispose of your writer. Reuse it instead.
  4. Documents are searchable once a reader sees them. This occurs after you commit your writer and reopen your reader. You could read them before they are commited by using near-realtime (NRT) searching by grabbing a reader from IndexWriter.OpenReader.