
how do you get the matching fuzzy term and its offset when using Lucene Fuzzy Search?

    IndexSearcher mem = ....(some standard code)

    QueryParser parser = new QueryParser(Version.LUCENE_30, CONTENT_FIELD, analyzer);

    TopDocs topDocs = mem.search(parser.parse("wuzzy~"), 1);
    // the ~ triggers the fuzzy search as per "Lucene In Action" 

The fuzzy search works fine. If a document contains the term "fuzzy" or "luzzy", it is matched. How do I get which term matched and what are their offsets?

I have made sure that all CONTENT_FIELDs are added with termVectorStored with positions and offsets .

Are you looking for something along these lines? lucene.apache.org/java/3_0_0/api/contrib-highlighter/index.htmlJared
No. I am not looking to hightlight text ;I need to do further text processing . Before doing further text processing , I need to figure out which term matched was it "fuzzy" or "luzzy" etc. as this is a fuzzy match.user193116

There was no straight forward way of doing this, however I reconsidered Jared's suggestion and was able to get the solution working.

I am documenting this here just in case someone else has the same issue.

Create a class that implements org.apache.lucene.search.highlight.Formatter

public class HitPositionCollector implements Formatter
    // MatchOffset is a simple DTO
    private List<MatchOffset> matchList;
    public HitPositionCollector(
        matchList = new ArrayList<MatchOffset>();

    // this ie where the term start and end offset as well as the actual term is captured
    public String highlightTerm(String originalText, TokenGroup tokenGroup)
        if (tokenGroup.getTotalScore() <= 0)
            MatchOffset mo= new MatchOffset(tokenGroup.getToken(0).toString(), tokenGroup.getStartOffset(),tokenGroup.getEndOffset());

        return originalText;

    * @return the matchList
    public List<MatchOffset> getMatchList()
        return matchList;

Main Code

public void testHitsWithHitPositionCollector() throws Exception
    System.out.println(" .... testHitsWithHitPositionCollector");
    String fuzzyStr = "bro*";

    QueryParser parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
    Query fzyQry = parser.parse(fuzzyStr);
    TopDocs hits = searcher.search(fzyQry, 10);

    QueryScorer scorer = new QueryScorer(fzyQry, "f");

    HitPositionCollector myFormatter= new HitPositionCollector();

    //Highlighter(Formatter formatter, Scorer fragmentScorer)
    Highlighter highlighter = new Highlighter(myFormatter,scorer);
        new SimpleSpanFragmenter(scorer)

    Analyzer analyzer2 = new SimpleAnalyzer();

    int loopIndex=0;
    //for (ScoreDoc sd : hits.scoreDocs) {
        Document doc = searcher.doc( hits.scoreDocs[0].doc);
        String title = doc.get("f");

        TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),

        String fragment = highlighter.getBestFragment(stream, title);

        assertEquals("the quick brown fox jumps over the lazy dog", fragment);
        MatchOffset mo= myFormatter.getMatchList().get(loopIndex++);
