So here are two other approaches you can take.
Treat off topic as a contamination.
When building a conversational system, it's important to know what your end users are actually saying. So collect questions from real users.
You will find that not many people will say a greeting and a question. I personally haven't done the statistical chance over projects I've done, but at least anecdotal I have not seen it happen often.
Knowing this, you can try removing off topic / chit-chat from your intents. As it does not fully reflect the domains you want to train on.
To counter this, you can create a more detailed second workspace with off topic/chit-chat. If you do not get a good hit on the primary workspace, you can call out to the second one. You can improve this by adding chit-chat to counter examples in the primary workspace.
You can also mitigate this by simply wording your initial response to the user. For example, if your initial response is a hello, have the system also ask a question. Or have it progress the conversation where a hello becomes redundant.
Detect possible compounded intents.
At the moment, this is only easily possible at the application layer.
Setting alternate_intents
to true
will return the top 10 intents and their confidences.
Before going further, if the top intent < 0.2 then it needs more training (so no need to proceed).
If > 0.2 you can map these on a graph, you can visually see if the top two intents. For example:
To have your application see this, you can use the k-means algorithm to create two buckets (k=2). That would be relevant and irrelevant.
Once you see more then one that is relevant, you can take action to ignore the chit-chat/off-topic.
There is more details, and sample code here.