7
votes

How do i make sure lucene gives me back relevant search results when my input string contains terms like c++? Lucene seems to ignore ++ characters.

Code details: When I execute this line,I get a blank search query.

queryField = multiFieldQueryParser.Parse(inpKeywords);

keywordsQuery.Add(queryField, BooleanClause.Occur.SHOULD);

And here is my custom analyzer:

public class CustomAnalyzer : Analyzer
    {
        private static readonly WhitespaceAnalyzer whitespaceAnalyzer = new WhitespaceAnalyzer();
    public override TokenStream TokenStream(String fieldName, System.IO.TextReader reader)
        {
            TokenStream result = whitespaceAnalyzer.TokenStream(fieldName, reader);
            result = new StandardTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StopFilter(result, stop_words);
            return result;
        }
}

And I'm executing search query this way:

indexSearcher.Search(searchQuery, collector);

I did try queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));,but it still does not work. Here is the query which get executed and returns zero hits. "+(())"

Thanks.

3

3 Answers

4
votes

Since, + is a special character, it needs to be escaped. The list of all characters that need to be escaped is here (See bottom of the page.)

You also need to be careful about the analyzer you use while indexing. For example, StandardAnalyzer will skip +. You may need to use something like WhiteSpaceAnalyzer while indexing and searching, which will preserve special characters in the tokenstream. Keep in mind that you need to use the same analyzer while indexing and searching.

1
votes

In addition to choosing the right analyzer, you can use QueryParser.Escape(string s) to ensure all special characters are properly escaped.

Because this is a static function, you can use it, even if you're using MultiFieldQueryParser.

For example, you can try something like this:

queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));
0
votes

Try UTF-8 encoding your search queries.

You can enable this as described in this article