Azure search: Wild card queries does not work with japanese/chinese characters

Question

I used icu_tokenizer using custom analyzer to create a search index for Japanese words. Index was created successfully. Using icu_tokenizer as for asian languages it works better than the default azure search tokenizer.

Now when I use query for string Ex:- 赤城 I see multiple search results (total 131) from the index. But when I use the wild card search with the same word, Ex: 赤城* (adding * at the end of the word) or /赤城.*/ (using regex search query) i see 0 search results. The weird part is that * seems to work with single japanese character 赤* gives me same number of search results as 赤 gives. But as soon as I increase the number of japanese characters from 1, wild card queries with * stops working and returns 0 search result. All of these queries I am testing it on search explorer on Azure portal using querytype=full (lucene syntax query)

In my application search terms are normally used as prefix search so normally we append * at the end of the search string to fetch search results but looks like these lucene wildcard queries with japanse characters just do not work. Any idea, how can I make these prefix queries (using wildcard * at end of search strings) work when search strings are given in japanese characters?

Any quick help will be much appreciated!!

Dan Gøran Lunde Dan Gøran Lunde · Accepted Answer · 2020-09-08T12:56:55

I tested with my installation now and I can confirm that wildcards only work with Japanese content when you use a Japanese analyzer.

In my example I set up one index using a property Body that does not have a specific analyzer defined. Then I set up another index where Body uses the ja.microsoft language analyzer. The content in both indexes are identical. I then tried to search for 自動車 (automobile) with a trailing wildcard.

自動車* returns multiple hits from my index using the japanese analyzer. No hits are returned from the index without a specific analyzer defined.

Azure search: Wild card queries does not work with japanese/chinese characters

2 Answers