0
votes

Is there a way with Lucene 4.4 to determine exactly which terms satisfied a query? I need to highlight only terms that caused the document to be returned, not the same term elsewhere in the document. For example, given the document:

We are going to visit the White House today. I hear it is painted white.

and the phrase query "white house", I want to highlight these terms:

We are going to visit the <b>White</b> <b>House</b> today. I hear it is painted white.

I've been using PostingsHighlighter, but it will highlight the word "white" in the second sentence as well. I don't want that because the single term "white" does not satisfy the phrase query.

It looks like the only information that comes back from a search are the document IDs and scores. I don't really care about scores for the purpose of relevancy ranking, because I'll be working with all of the documents returned. Is there something I could do with custom scoring that would preserve the information I need? Or is there a better approach that I'm missing?

1

1 Answers

1
votes

This appears to be intended behavior of PostingsHighlighter (see this discussion). You might consider using Highlighter, instead.