MultiFieldQueryParser is removing dots from the acronym

Question

Am posting this question again as my query is not answered.

Am working on a book search api using Lucene. User can search for a book whose title or description field contains C.F.A... Am using StandardAnalyzer alongwith a list of stop words.

Am using MultiFieldQueryParser for parsing above string.But after parsing, its removing the dots in the string. What am i missing here?

Thanks.

itsadok itsadok · Accepted Answer · 2009-03-19T07:25:31

As you mentioned, this is a dupe of this question. I suggest you at least add a link to it in your question. Also, I would urge you to create a user account, since right now it's not possible to look at your old question to get context.

The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.

I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.

Here's some code I wrote to play with the StandardAnalyzer:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

The output for this, by the way was:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.

MultiFieldQueryParser is removing dots from the acronym

2 Answers