0
votes

I don't want special characters when I am indexing words of a string. I understand StandardAnalyzer removes the special characters but it also does not index stopwords and single characters and I want to index stopwords and single characters.

Eg: list of hotel management organisation (hmo) site Indexed words: list, of, hotel, management, organisation, hmo, site

Is there a filter for this? How can I build a custom Analyzer for this purpose? Maybe a filter that replaces non-alphanumeric characters with ""?

1

1 Answers

0
votes

StandardAnalyzer sounds like a good fit. Just construct it with an empty stopword set:

Analyzer analyzer = new StandardAnalyzer(CharArraySet.EMPTY_SET);

As far as building your own analyzer, check the Analyzer docs. There is an example there of how building your own analyzer should look. If StandardAnalyzer is close, you might copy the createComponents from it as a starting point.