MS LUIS: Number of Intents / Data Imbalance

Question

I am seeing on the LUIS documentation page here that you absolutely recommend to treat Data Imbalance (e.g. the differing number of total unterances compared amongst various intents) as a first priority. We currently see a mean of 19 Utterances per Intent on our dashboard, so in my opinion I should optimize all Intents towards having about 20 Utterances each as an example.

Now my question: When I use active learning by adding Endpoint Utterances, Utterances will be added to the intent we see them fitting (Active Learning Documentation). How can I ensure, that the number of utterances per intent will always remain equal (e.g. around 20 in our example)? In my opinion naturally by attributing endpoint utterances to Intents, a Data Imbalance will be created again.

Thanks a lot!

Best, Mark

Dana V Dana V · Accepted Answer · 2019-12-23T21:08:15

After your initial model is satisfactory, there no longer needs to be equality between intents, active learning specifically tries to correct for cases that were unseen of before, so if other examples already cover all your cases, then you don’t need to actively correct it.

MS LUIS: Number of Intents / Data Imbalance

1 Answers