1
votes

I am currently working with MS LUIS.ai.

My string/utterance contains both English and Chinese.

Here is the problem:

While if sentence is ALL in English, it works fine in LUIS. The reason is probably because a sentence is composed of different words, which are split by a "space".

However, in Chinese (Both Traditional and Simplified), a sentence is composed of words that are concanated/joined together and difficult to be split.

For example, in English I can write:

I love you so much: There are 5 words here. In LUIS I can select I love you and turn it into an entity. And later on, when more words like I love you goes in LUIS, it can identify the related intent easily.

However, in Chinese if I write:

我很喜歡你: which has the same meaning as in English above. Under LUIS it will be counted as 1 word. If I want to extract the word 喜歡 (which means "Love/Like"), I cannot do this in LUIS.

Only if I put space around 喜歡 like this: 我很 喜歡 你 will I be able to select 喜歡 as a particular entity.

My Question:

Are there any ways/methods/tricks that I can use so that, when someone enters joined-string, like what you see in the Chinese version, to LUIS, LUIS will be able to identify specific words as entity automatically, without any manual change?

Thank you very much in advance for all your help.

1

1 Answers

0
votes

To perform machine learning, LUIS breaks an utterance into tokens based on culture. We cannot suppress tokenization. LUIS tokenizes Chinese at character level and returns tokenized entity whereas for English it tokenizes for every space or special character. In the zh-cn culture, LUIS expects the simplified Chinese character set instead of the traditional character set. Hope this helps!!