It is a maxim in the field of Text To Speech that the more restrictive the vocabulary, the greater the accuracy. And, conversely, the greater the vocabulary, the lower the accuracy.
A system like VoiceXML (used mostly for telephone prompt software) has a very strict vocabulary, and generally performs well for the domains it has been tailored for.
A system like Watson TTS is completely open, but makes up for it's lack of accuracy by returning a confidence level for several different interpretations of the sounds. In short, it offloads much of the NLP work to you.
Amazon have, very deliberately, chosen a middle road for Alexa. Their intention model allows for more flexibility than VoiceXML, but is not as liberal as a dictation system. The result gives you pretty good options and pretty good quality.
Because of their decisions, they have a voice model where you have to declare, in advance, everything it can recognize. If you do so, you get consistent and good quality recognition. There are ways, as others have said, to "trick" it into supporting a "generic slot". However, by doing so, you are going outside their design and consistency and quality suffer.