How do I use StandardAnalyzer with TermQuery?

Question

I'm trying to produce something similar to what QueryParser in lucene does, but without the parser, i.e. run a string through StandardAnalyzer, tokenize this and use TermQuery:s in a BooleanQuery to produce a query. My problem is that I only get Token:s from StandardAnalyzer, and not Term:s. I can convert a Token to a term by just extracting the string from it with Token.term(), but this is 2.4.x-only and it seems backwards, because I need to add the field a second time. What is the proper way of producing a TermQuery with StandardAnalyzer?

I'm using pylucene, but I guess the answer is the same for Java etc. Here is the code I've come up with:

from lucene import *
def term_match(self, phrase):
    query = BooleanQuery()
    sa = StandardAnalyzer()               
    for token in sa.tokenStream("contents", StringReader(phrase)):
        term_query = TermQuery(Term("contents", token.term())
        query.add(term_query), BooleanClause.Occur.SHOULD)

RichieHindle RichieHindle · Accepted Answer · 2009-09-07T18:28:22

The established way to get the token text is with token.termText() - that API's been there forever.

And yes, you'll need to specify a field name to both the Analyzer and the Term; I think that's considered normal. 8-)

How do I use StandardAnalyzer with TermQuery?

2 Answers