3
votes

My understanding is that Amazon ASK still does not provide:

  • The raw user input
  • An option for a fallback intent
  • An API to dynamically add possible options from which Alexa can be better informed to select an intent.

Is this right or am I missing out on knowing about some critical capabilities?

Actions on Google w/ Dialogflow provides:

These tools provide devs with the ability to check to see if the identified intent is correct and if not fix it.

I know there have been a lot of questions asked previously, just a few here:

How to add slot values dynamically to alexa skill

Can Alexa skill handler receive full user input?

Amazon Alexa dynamic variables for intent

I have far more users on my Alexa skill than my AoG app simply because of Amazon's dominance to date in the market - but their experience falls short of a Google Assistant user experience because of these limitations. I've been waiting for almost a year for new Alexa capabilities here, thinking that after Amazon's guidance to not use AMAZON.LITERAL there would be improvements coming to custom slots. To date it still looks like this old blog post is still the only guidance given. With Google, I dynamically pull in utterance options from a db that are custom for a given user following account linking. By having the user's raw input, I can correct the choice of intent if necessary.

If you've wanted these capabilities but have had to move forward without them, what tricks do you have to get accurate intent handling with Amazon when you don't know what the user will say?

EDIT 11/21/17: In September Amazon announced the Alexa Skill Management API (SMAPI) which does provide the 3rd bullet above.

2

2 Answers

1
votes

Actually this should be better a comment but i write to less at stackoverflow to be able to comment. I am with you on all. But Amazons Alexa has also a very big advance.

The intent Schema is seeming to directly influence the Voice to Text recognition. Btw. can someone confirm if this is correct?

At Google Home it seems not to be the case. So matching of unusual names is even more complicated than at alexa. And it sometimes just recognize absolute bullshit.

Not sure which I prefer currently.

My feeling is for small apps is Alexa much better, because it better match the Intent phrases when it has lesser choices. But for large Intent schemas, it get really trouble and in my tests some of the intents were not matched at all correct.

Here the google home and action SDK wins, probably? Cause Speech to text seem to be done before and than a string pattern to intent schema matching is happening. So this is probably more robust for larger schemas?

To get something like an answer on your questions:

You can try to add as much as possible that can be said to a slot. And than match the result from the Alexa request to your database via Jaro winkler or some other string distance.

Was I tried for Alexa was to find phrases that are close to what the user say. And this i added as phrases to fill a slot.

So a module in our webpage was an intent in the schema. And Than I requested To say what exactly should be done in that module (this was the slot filling request). The Answer was the slot filling utterance.

For me that was slightly better working than the regulary intent schema. But it require more talking so i dont like it so much.

1
votes

Let me go straight to answering your 3 questions:

1) Alexa does provide the raw input via the slot type AMAZON.Literal but it's now deprecated and you're advised to use AMAZON.SearchQuery for free form capture. However, if instead of using SearchQuery you define a custom slot type and provide samples (training data) the ASR will work better.

2) Alexa supports FallbackIntent since I believe May 2018. The way it work is by automatically generating a model for your skill where out-of-domain requests are routed through a fallback intent. It works well

3) Dynamically adding slot type values is not feasible since when you provide samples you're really providing training data for a model than will be able to then process similar values beyond the ones you defined. If you noticed when you provide a voice interaction model schema then you have to build the model (in this step the training data provided in the samples is used to create the model). One example, when you define a custom slot of type "Car" and you provide the samples "Toyota", "Jeep", "Chevrolet" and "Honda" then, the system will also go to the same intent if the user says "Ford"

Note: SMAPI does allow to get and update the interaction model, so technically you could download the model via API, modify it with new training data, upload it again and rebuild the model. This is kind of awkward though