Lucene 3.5 is not supporting Chinese Russain Korean Languages while searching

Question

I am using Lucene 3.5 Standard Analyzer for indexing and searching. Its working for all languages other than Chinese, Japanese and Korean languages. I tried with CJK Analyzer and Chinese Analyzers. But still not working. Index is getting created correctly. We have verified this with Luke tool. But not able to search the above language words, both using Luke tool and from code using Analyzers. Any solution for this.

伊拉克航空公司               

+name:伊拉克航空公司~0.9     This  is the lucene query generated by the analyzer for this chinese word. But not returning result. But other languages and its corresponding query is returning results

are you using any analyzer during query time ? show some examples of our index and query strings. — user156327

Hash Jang Hash Jang · Accepted Answer · 2018-01-02T06:36:45

For Chinese, there are many useful 3rd party Analyzer such as:

mmseg4j
IK-analyzer
ansj_seg
imdict-chinese-analyzer

I recommend IK-analyzer, for example: Add this to your dependency:

    <dependency>
        <groupId>com.janeluo</groupId>
        <artifactId>ikanalyzer</artifactId>
        <version>2012_u6</version>
    </dependency>

The example code:

public class LuenceFirst {
    public static void main(String[] args) throws IOException {
        Analyzer analyzer = new IKAnalyzer(); 
        TokenStream tokenStream = analyzer.tokenStream("", "伊拉克航空公司");

        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
        OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
        tokenStream.reset();
        while (tokenStream.incrementToken()) {
            System.out.println("start→" + offsetAttribute.startOffset());
            System.out.println(charTermAttribute);
            System.out.println("end→" + offsetAttribute.endOffset()); 
        }
        tokenStream.close();
    }
}

The output is: start→0

伊拉克

end→3

start→3

航空公司

end→7

start→3

航空

end→5

start→5

公司

end→7

For Japanese:

Lucene 3.5 is not supporting Chinese Russain Korean Languages while searching

1 Answers