0
votes

Sorry for the concern, but I hope to get any help from Lucene-experienced people.

Now we use in our application Lucene.Net 3.0.3 to index and search by ~2.500.000 items. Each entity contains 27 searchable field, which added to index in this way: new Field(key, value, Field.Store.YES, Field.Index.ANALYZED))

Now we have two search options:

  1. Search only by 4 fields using fuzzy search
  2. Search by 4-27 fields using exact search

We have a search service that every week automatically searches by about 53000 people such “Bob Huston”, “Sara Conor”, “Sujan Hong Uin Ho”, etc.

So we experience slow search speed in option 1, its an average 4-8 sec in searcher.Search and its our major problem.

Search sample code:

                var index = FSDirectory.Open(indexPath);
                var searcher = new IndexSearcher(index, true);
                this.analyzer = new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>())
                var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, queryFields, this.analyzer);
                queryParser.AllowLeadingWildcard = false;
                Query query;
                query = queryParser.Parse(token);
                var results = searcher.Search(query, NumberOfResults);// NumberOfResults==500

Our fuzzy search query to find “bob cong hong” in 4 fields:

(((PersonFirstName:bob~0.6) OR (PersonLastName:bob~0.6) OR (PersonAliases:bob~0.6) OR (PersonAlternativeSpellings:bob~0.6)) AND ((PersonFirstName:cong~0.6) OR (PersonLastName:cong~0.6) OR (PersonAliases:cong~0.6) OR (PersonAlternativeSpellings:cong~0.6)) AND ((PersonFirstName:hong~0.6) OR (PersonLastName:hong~0.6) OR (PersonAliases:hong~0.6) OR (PersonAlternativeSpellings:hong~0.6)))

Current improvements:

  1. We combined these 4 fields to 1 search field
  2. We decided to use single IndexSearcher in service instead of open in every search request
  3. MergeFactor=2

Total combination of improvements produces about 30-40% speed increasing.

Following this article we`ve made most of possible optimizations:

Do you have other suggestions how to improve search speed in our situation?

Thank you.

1

1 Answers

1
votes

You can improve the speed of Fuzzy Queries by setting their prefix length to a non-zero value. This will allow lucene to narrow the set of possible results efficiently. Like this:

queryParser.FuzzyPrefixLength = 2;

Also, it doesn't affect the query you've provided as an example, but if you care at all about performance, you should remove the line queryParser.AllowLeadingWildcard = false;. Leading wildcards will absolutely kill performance.