1
votes

Im having a bit of trouble finding some information on whats happening with my lucene searches.

(Id:gloves* Search:gloves* SpellCheckerSource:gloves*) OR
(Id:gloves Search:gloves SpellCheckerSource:gloves) OR
(Id:glove* Search:glove* SpellCheckerSource:glove*) 

When I search for the above I get the following rewritten term

(() () ())
(Id:glove Search:glove SpellCheckerSource:glove) 
(() ConstantScore(Search:glove*) ConstantScore(SpellCheckerSource:glove*))

This is using LUKE, I have been running the query in LUKE to try see whats going on. http://www.getopt.org/luke/

Now what I want to be able to do is search for a term ie gloves* which ends up being (() () ())

I don't understand why this gets translated like this is there and issue with my query or with my index?

LUKE tells me the structure explanation is as follows

lucene.BooleanQuery
  clauses=3, maxClauses=1024
  Clause 0: SHOULD
    lucene.BooleanQuery
      clauses=3, maxClauses=1024
      Clause 0: SHOULD
        lucene.BooleanQuery
          clauses=0, maxClauses=1024, coord=false
      Clause 1: SHOULD
        lucene.BooleanQuery
          clauses=0, maxClauses=1024, coord=false
      Clause 2: SHOULD
        lucene.BooleanQuery
          clauses=0, maxClauses=1024, coord=false
  Clause 1: SHOULD
    lucene.BooleanQuery
      clauses=3, maxClauses=1024
      Clause 0: SHOULD
        lucene.TermQuery
          Term: field='Id' text='glove'
      Clause 1: SHOULD
        lucene.TermQuery
          Term: field='Search' text='glove'
      Clause 2: SHOULD
        lucene.TermQuery
          Term: field='SpellCheckerSource' text='glove'
  Clause 2: SHOULD
    lucene.BooleanQuery
      clauses=3, maxClauses=1024
      Clause 0: SHOULD
        lucene.BooleanQuery
          clauses=0, maxClauses=1024, coord=false
      Clause 1: SHOULD
        lucene.ConstantScoreQuery, ConstantScore(Search:glove*)
          Filter: Search:glove*
      Clause 2: SHOULD
        lucene.ConstantScoreQuery, ConstantScore(SpellCheckerSource:glove*)
          Filter: SpellCheckerSource:glove*

This seems strange to me on multiple levels

  1. Why have I got translated blank clauses?
  2. Why have I got a mix of TermQuery,ConstantScoreQuery And BooleanQuery?
  3. Where are ConstantScoreQuery getting generated?

It should be noted everything works fine for me when i search for a term with out and s IE glove or with out a wildcard just the combination of the two seems to break the query.

1

1 Answers

2
votes

This is probably happening because there are no terms in your index that match "gloves*".

When a MultiTermQuery is rewritten, it finds the Terms that are suitable, and creates primitive queries (such as TermQuery) on those terms. If no suitable terms are found, you'll see an empty query generated instead, like what you've shown.

A TermQuery is already a primitive query, and no rewriting is needed there. It doesn't have to enumerate terms or anything, it just runs the thing.

The other piece of this is analysis. Your query for gloves is getting analyzed to glove (EnglishAnalyzer perhaps?). MultiTermQueries (like wildcard, fuzzy, regex and prefix queries) are not analyzed by the QueryParser. Your prefix query is trying to find " "gloves", but all those plural s, have been stemmed away, so it doesn't find any matches.