3
votes

I've got an index that currently occupies about 1gb of space and has about 2.5 million documents. The index is stored on a solid-state drive for speed. I'm adding 2500 documents at a time and committing after each batch has been added. The index is a "live" index and needs to be kept up-to-date throughout the day and night, so minimising write speeds is very important. I'm using a merge factor of 10 and am never calling Optimize(), rather allowing the index to optimize itself as needed based on the merge factor.

I need to commit the documents after each batch has been added because I record this fact so that if the app crashes or restarts, it can pick up where it left off. If I didn't commit, the stored state would be inconsistent with what's in the index. I'm assuming my additions, deletions and updates are lost if the writer is destroyed without committing.

Anyway, I've noticed that after an arbitrary period of time, which could be anywhere from two minutes or two hours and some variable number of previous commits, the indexer seems to stall on the IndexWriter.AddDocument(doc) method and I can't for the life of me figure out why it's stalling or how to fix it. The block can stay in place for upwards of two hours, which seems strange for an index taking up less than 2GB in the low millions of documents and having an SSD drive to work with.

What could cause AddDocument to block? Are there any Lucene diagnostic utilities that could help me? What else could I look for to track down the problem?

1
I'm assuming you are using ConcurrentMergeScheduler?goalie7960
ConcurrentMergeScheduler is the default merge scheduler, according to the documentation.Nathan Ridley
That is true, just trying to get the obvious out of the way.goalie7960
I'm not sure if Lucene.NET supports the IndexWriter.SetInfoStream() but if you redirect this output to something you can inspect, it might give you a clue.jishi
Yes it does support that. I would also subclass ConcurrentMergeScheduler, and override the Domerge and HandleMergeException methods, and write a log whenever these methods get called. Merging would definitely cause a strain on your disk IO.goalie7960

1 Answers

1
votes

You can use IndexWriter.SetInfoStream() to redirect diagnostics output to a stream that might give you a hint of what's wrong.