1
votes

I'm working with Microsoft.Speech.Recognition and need to use quite large grammars for a recognition task. So I create and later modify a grammar as a SrgsDocument and then construct a Grammar object from that. At that point, I load the grammar into the engine to prepare for recognition using the SpeechRecognitionEngine.LoadGrammar method.

In other words I have something like:

SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
SrgsDocument gramDoc = new SrgsDocument();
//...modify the SrgsDocument (add rules, etc.)
Grammar gram = new Grammar(gramDoc);
sre.LoadGrammar(gram);

And at this point, loading the grammar, after a few minutes I sometimes (not always, and not really as a function of grammar size) get the error "A task could not complete because the SR engine had timed out."

If I catch the exception and try to load the same grammar into the same engine again, sometimes it loads successfully (though very slowly), and sometimes it gives the same error again.

What's causing this? Why would it sometimes time out and sometimes work with the same grammar/engine?

And is there something I can do to make the grammar load faster, period?

Any ideas would be really appreciated.

2

2 Answers

4
votes

What's causing this?

Grammar is too large.

Why would it sometimes time out and sometimes work with the same grammar/engine?

Sometimes it is smaller.

And is there something I can do to make the grammar load faster, period?

I presume you compile the gramar with Grammar Tools

You can use smaller grammar. You actually shouldn't use grammars of very big size, it degrades recognition accuracy because hand-constucted grammar usually fails to capture all the language dependencies.

If your language contains many choices or complex sentences it's better to simplify the grammar by making it more flexible. Instead of organizing the tree of choices you can split the chunks and present them as a separate choices. For example, if you consider grammar like

<result> = <day> <month> <year> <digit> |
            <year> <month> <digit> |
            <digit> <year> <month> 

trying to capture different orders, it's better to give more flexibility

<result> = ( <day> | <month> | <year> | <digit> )*

but simplify dependencies.

A good alternative is a statistical language model in ARPA format. Once you collect sample prompts you can just create an ARPA model which will give you way better results than hand-constructed grammar.

1
votes

A couple of other possibilities:

  1. Split the grammar into multiple independent portions & load each portion into its own Grammar object. (Especially handy when you have static subtrees).
  2. Merge subtrees using rules & rulerefs.

Unfortunately, I don't believe that Microsoft.Speech.Recognition supports SrgsSubset items, which are very handy for building dynamic grammars.

To amplify @NikolayShmyrev 's answer - it's often better to simplify the grammar and trust the user not to say awkward phrases that are admissible but unlikely. You can always reject those phrases during the interpretation phrase anyway.