1
votes

As the title says, I need a Lucene.Net Case Insensitive Keyword Analyzer.

I've come up with this...

using Lucene.Net.Analysis;

namespace LuceneTools
{
  public sealed class LowerCaseKeywordAnalyzer :Analyzer
  {
    public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
    {
        return new LowerCaseFilter(new KeywordTokenizer(reader));
    }

    public override TokenStream ReusableTokenStream(System.String fieldName, System.IO.TextReader reader)
    {
        return new LowerCaseFilter(new KeywordTokenizer(reader));
    }
  }
}

Although the code above seems to work, I don't really understand what I should be doing differently with the ReusableTokenStream ? I guess the above is bad but I'm not sure how/why or what to do about it. Perhaps I shouldn't override that at all,but if I don't what happens to code that uses that route?

1

1 Answers

1
votes

What you have is effectively equivalent to not implementing ReusableTokenStream. Here is the Analyzer source code (java):

public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
    return tokenStream(fieldName, reader);
}

So your implementation isn't doing anything worse than the default implementation.

The purpose of ReuseableTokenStream is to improve performance by not having to recreate everything every time it's called, generally by attempting to reset() the last stream, and falling back on simply calling tokenStream if that fails. Obviously, your implementation doesn't really do that.