1
votes

I am reading the Lucene in Action book and I do not understand the multi-term phrases part.

The following text is indexed:

the quick brown fox jumped over the lazy dog

And then you add the following terms to the PhraseQuery: quick jumped lazy with a slop equal 4. That results in a match, but I don't understand how that happens. How do you calculate the number of moves when there are multiple terms? I don't understand how they do it.

The same with the terms lazy jumped quick with slop equal 8.

1

1 Answers

2
votes

The slop is actually an edit distance. Inserting extra terms in between them adds 1 to the distance, transposing terms adds 2 (the first edit moving the two terms atop one another).

You can go through the edits one at a time to illustrate:

  • quick jumped lazy distance:0
  • quick _ jumped lazy distance:1
  • quick _ _ jumped lazy distance:2
  • quick _ _ jumped _ lazy distance:3
  • quick _ _ jumped _ _ lazy distance:4

And for the second case:

  • lazy jumped quick distance:0
  • lazy/jumped quick distance:1
  • lazy/jumped/quick distance:2 (all three terms superimposed, in the same position)
  • quick lazy/jumped distance:3
  • quick jumped lazy distance:4
  • quick _ jumped lazy distance:5
  • quick _ _ jumped lazy distance:6
  • quick _ _ jumped _ lazy distance:7
  • quick _ _ jumped _ _ lazy distance:8