0
votes

I used icu_tokenizer using custom analyzer to create a search index for Japanese words. Index was created successfully. Using icu_tokenizer as for asian languages it works better than the default azure search tokenizer.

Now when I use query for string Ex:- 赤城 I see multiple search results (total 131) from the index. But when I use the wild card search with the same word, Ex: 赤城* (adding * at the end of the word) or /赤城.*/ (using regex search query) i see 0 search results. The weird part is that * seems to work with single japanese character 赤* gives me same number of search results as 赤 gives. But as soon as I increase the number of japanese characters from 1, wild card queries with * stops working and returns 0 search result. All of these queries I am testing it on search explorer on Azure portal using querytype=full (lucene syntax query)

In my application search terms are normally used as prefix search so normally we append * at the end of the search string to fetch search results but looks like these lucene wildcard queries with japanse characters just do not work. Any idea, how can I make these prefix queries (using wildcard * at end of search strings) work when search strings are given in japanese characters?

Any quick help will be much appreciated!!

2

2 Answers

1
votes

I tested with my installation now and I can confirm that wildcards only work with Japanese content when you use a Japanese analyzer.

In my example I set up one index using a property Body that does not have a specific analyzer defined. Then I set up another index where Body uses the ja.microsoft language analyzer. The content in both indexes are identical. I then tried to search for 自動車 (automobile) with a trailing wildcard.

自動車* returns multiple hits from my index using the japanese analyzer. No hits are returned from the index without a specific analyzer defined.

0
votes

sorry for the late reply. Have you tried using one of the Japanese language analyzers? For example, ja.microsoft

Also, if you want to use prefix search, you can try experimenting with the suggester feature which is designed to be efficient for this scenario.