3
votes

I'm trying to search a couple of fields and it works fine unless I use the term "BE". In this case lucene simply ignores the field. If I do something like what is shown below I get the correct results and the "query" object is shown as "+flag:bf +type:cgo". If I set either of the flag or the type terms to be "BE" that part of the search will be ignored. For example if I set the queryString to "flag:\"BE\" AND type:\"CGO\"" the query object will be shown as: "+type:cgo" and I'll get a lot more hits. Same happens for "type" - If I change "CGO" in the last example to "BE" it will be ignored. I have not tried every possible 2 character combinations (but I've tried many) but all work as expected except this one. I'm not using any stop terms.

Thanks, Gene

String queryString = "flag:\"BF\" AND type:\"CGO\"";
QueryParser qp  = new QueryParser(Version.LUCENE_30, 
                  "type", new StandardAnalyzer(Version.LUCENE_30));

Query query = qp.parse(queryString);
IndexSearcher searcher = new IndexSearcher(reader.reopen());
TopDocs td = searcher.search(q, 5000);
logger.info("Found " + td.totalHits + " hits using " + query.toString() );
2

2 Answers

4
votes

By default, the StandardAnalyzer uses a set of stop words to exclude "noise" from the indexed terms in text. I think that "BE" would normally be considered a stop word, in the context of the StandardAnalyzer.

Luckily, you've got a few choices available to you

The obvious one is to pass an empty set of stop words to the constructor of the StandardAnalyzer used.

However, looking at the names of your fields ("flag" and "type"), they don't exactly look like they're intended to contain straightforward text, but more likely to contain coded words. With that in mind, you might find the keyword analyzer is a better fit.

Good luck,

1
votes

You are indeed using stopwords, although you might not be trying to:

QueryParser qp  = new QueryParser(Version.LUCENE_30, 
              "type", new StandardAnalyzer(Version.LUCENE_30));

StandardAnalyzer uses the standard English stopwords by default, which includes "be".