3
votes

We are using Lucene 2.9.2 (upgrade to 3.x is planned) and it is a known fact that the search queries become slower over time. Usually we perform a full reindex. I have read the question https://stackoverflow.com/a/668453/356815 and its answers and to answer it right now: we do NOT use optimize() because performance was not acceptable anymore when running it.

Fragmentation?

I wonder the following: What are the best practices to measure the fragmentation of an existing index? Can Luke help me in that?

It would be very interesting to hear your thoughts about this analysis topic.

A bit more infos about our index:

  • We have indexed 400'000 documents
  • We heavily use properties per document
  • For each request we create a new searcher object (as we want changes to appear immediately in the search results)
  • Query performance is between 30ms (repeated same searches) and 10 seconds (complex)
  • The index consists of 44 files (15 .del files, 24 cfs files) and has a size of 1GB
1

1 Answers

3
votes

Older version of Lucene did not effectively deal with large numbers of segments. This is why some people recommended to optimize (merge all segments together) in order to improve search performance.

This is less true with recent versions of Lucene. Indeed optimize has been renamed to sound less magical (you now need to call forceMerge(1)) and always merging segments is even considered harmful (look at this nice article from Lucene developer Simon Willnauer).

For each request we create a new searcher object

Opening a reader is very costly. You should rather use SearcherManager which will help you reopen (incremental open) your index only when necessary.