How to return query term that scored a hit in index

Question

I am trying to return the original terms that caused hits in my Lucene index. For example, My Search string is "The quick brown fox jumps over the lazy dog". The term 'dog' has hits in the index like 'dog leash' 'walking the dog'. Likewise, 'fox' has hits like 'fox glove' 'foxy loxi'.

So, I want to print out the original 'quick brown fox string for the user with the terms that have hits (dog and fox) highlighted. The are a couple of examples like Get matched terms in query which use the explain method, but the answers don't go that last inch. I'm thinking that Lucene won't do it easily and I will have to use regex.

Phil Phil · Accepted Answer · 2018-06-17T03:52:43

I figured one way to produce a string that is the original user text with highlighted hit terms. The original user text is queried against the index in the usual way. The original user text and results are passed to a 'reversed' query method. That is: The original user text is turned into a memory-based index and is queried by the original results. This is the opposite of what we did originally. The result is that common words in the results are compaired back to the string. This works in my index because all the the results are 'strict' definitions.

A highlighter is used to insert delimiters around the found common words [..] in the original results. Regex (?<=\[)(.*?)(?=\]) is used to remove the individual found words using the delimiters.

The individual found words and and the original user text is passed to the following method which removes term duplicates and highlights the words in the user's original string:

//remove found term duplicates and produce a single string with all the hits highlighted

private static void removeTermDuplicates(List textResult, String searchText) {

// to be the final modified string with all highlights
String strOutput = searchText;

// creating a hashset using the incoming list
Set<String> textSet = new LinkedHashSet<String>(textResult);
// remove all the elements from the list 
textResult.clear();
// add all the elements of the set to create a

// list of found terms without duplicates
textResult.addAll(textSet);

// add html elements to found terms
for(String term : textResult){
    replacementWord.add("<b>"+term+"</b>");
}
//put original term and the same term with highlights in a hash map
for(int i=0; i<replacementWord.size(); ++i) {
    oldAndNewTerms.put(textResult.get(i), replacementWord.get(i));

}

//use a hash map to modify the original string
for (String key : oldAndNewTerms.keySet()){       

      strOutput = strOutput.replace(key,oldAndNewTerms.get(key) );      }

System.out.println(strOutput);

}

Hope this helps someone in the future. Phil

How to return query term that scored a hit in index

1 Answers