2
votes

Suppose that i have 5 documents having the field text as follow:

  1. the red house is beautiful
  2. the house is little
  3. the red fish
  4. the red and yellow house is big

What kind of query should i use to retrieve the documents such that the rank is the following if i search for "red house":

  1. the red house is beautiful and big [matching: red house]
  2. the red and yellow house is big [matching: red x x house]
  3. the house is little [matching: house]
  4. the red fish [matching: red]

What i need is to give an high rank to the documents that match the phrase i've searched, and a lower score to the documents that have just a part of the phrase searched. Notice that the string query could contains also more than 2 terms.

It is like a PhraseQuery in which each term can appear or not, and in which the closer are the terms the higher is the score.

I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.

How can i do?

Thanks

2

2 Answers

1
votes

Try creating a BooleanQuery composed of TermQuery objects, combined with OR (BooleanClause.Occur.SHOULD). This will match documents where only one term appears, but should give a higher score to those where both appear.

Query term1 = new TermQuery(new Term("text", "red"));
Query term2 = new TermQuery(new Term("text", "house"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
booleanQuery.add(term2, BooleanClause.Occur.SHOULD);
0
votes

I think a PhraseQuery with a postive setSlope, SHOULD-combined with a TermQuery for every term, should get you there. Maybe with a boost for the PhraseQuery.

I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.

What do you get with this combination and how it is not what you need?