I'm trying to use the SpeechRecognizer with a custom Grammar to handle the following pattern:
"Can you open {item}?" where {item} uses DictationGrammar.
I'm using the speech engine built into Vista and .NET 4.0.
I would like to be able to get the confidences for the SemanticValues returned. See example below.
If I simply use "recognizer.AddGrammar( new DictationGrammar() )", I can browse through e.Results.Alternates and view the confidence values of each alternate. That works if DictationGrammar is at the top level.
Made up example:
- Can you open Firefox? .95
- Can you open Fairfax? .93
- Can you open file fax? .72
- Can you pen Firefox? .85
- Can you pin Fairfax? .63
But if I build a grammar that looks for "Can you open {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}?", then I get this:
- Can you open Firefox? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
- Can you open Fairfax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
- Can you open file fax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
- Can you pen Firefox? .85 - Semantics = null
- Can you pin Fairfax? .63 - Semantics = null
The .91 shows me that how confident it is that it matched the pattern of "Can you open {item}?" but doesn't distinguish any further.
However, if I then look at e.Result.Alternates.Semantics.Where( s => s.Key == "item" ), and view their Confidence, I get this:
- Firefox 1.0
- Fairfax 1.0
- file fax 1.0
Which doesn't help me much.
What I really want is something like this when I view the Confidence of the matching SemanticValues:
- Firefox .95
- Fairfax .93
- file fax .85
It seems like it should work that way...
Am I doing something wrong? Is there even a way to do that within the Speech framework?
I'm hoping there's some inbuilt mechanism so that I can do it the "right" way.
As for another approach that will probably work...
- Use the SemanticValue approach to match on the pattern
- For anything that matches on that pattern, extract the raw Audio for {item} (use RecognitionResult.Words and RecognitionResult.GetAudioForWordRange)
- Run the raw audio for {item} through a SpeechRecognizer with the DictationGrammar to get the Confidence
... but that's more processing than I really want to do.