0
votes

I have a Java (lucene 4) based application and a set of keywords fed into the application as a search query (the terms may include more than one words, eg it can be: “memory”, “old house”, “European Union law”, etc).

I need a way to get the list of matched keywords out of an indexed document and possibly also get keyword positions in the document (also for the multi-word keywords). I tried with the lucene highlight package but I need to get only the keywords without any surrounding portion of text. It also returns multi-word keywords in separate fragments.

I would greatly appreciate any help.

1

1 Answers

0
votes

There's a similar (possibly same) question here: Get matched terms from Lucene query

Did you see this?

The solution suggested there is to disassemble a complicated query into a more simple query, until you get a TermQuery, and then check via searcher.explain(query, docId) (because if it matches, you know that's the term).

I think It's not very efficient, but it worked for me until I ran into SpanQueries. it might be enough for you.