How to make lucene index fields case insensitive

Question

How to make lucene index fields case insensitive. I mean is there any way to lowercase index fields in the query and not the values.

I cant convert entire query into lowercase as it affects other queries which used whitespace analyzers.

Query.extractterms() -> Method returned me the array of terms but it does not work if the inputs contains wildcards i.e *

I need this because I have lowercase the index fields.e.g

If I have field that is index with "actor" I should be able to get results for the query containing "Actor:abc" as well as "ACTOR:abc"

Any idea?

bibounde bibounde · Accepted Answer · 2014-04-16T09:22:37

A solution is to create your own Analyzer and add the LowerCaseFilter directive.

Here is an example of a custom french analyzer which is case insensitive:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.core.StopFilter;
import org.apache.lucene.analysis.fr.FrenchAnalyzer;
import org.apache.lucene.analysis.fr.FrenchLightStemFilter;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.util.ElisionFilter;
import org.apache.lucene.util.Version;

import java.io.Reader;

/**
 * Completes {@link org.apache.lucene.analysis.fr.FrenchAnalyzer} with accent management
 */
public class CustomFrenchAnalyzer extends Analyzer {

    /**
     * Lucene version
     */
    private final Version matchVersion;

    /**
     * Constructs a new analyzer
     * @param matchVersion compatibility version
     */
    public CustomFrenchAnalyzer(final Version matchVersion) {
        this.matchVersion = matchVersion;
    }

    @Override
    protected final TokenStreamComponents createComponents(final String s, final Reader reader) {
        final Tokenizer source = new StandardTokenizer(matchVersion, reader);
        TokenStream result = new StandardFilter(matchVersion, source);
        result = new ElisionFilter(result, FrenchAnalyzer.DEFAULT_ARTICLES);
        result = new StopFilter(matchVersion, result, FrenchAnalyzer.getDefaultStopSet());
        result = new ASCIIFoldingFilter(result);
        result = new LowerCaseFilter(matchVersion, result);
        result = new FrenchLightStemFilter(result);

        return new TokenStreamComponents(source, new LowerCaseFilter(matchVersion, result));
    }
}

How to make lucene index fields case insensitive

1 Answers