3
votes

I am using the Microsoft Speech SDK to implement a software using voice recognition.

I feed the recognition engine with a quite normal grammar, but when starting the engine and saying something correct, it recognizes what i say but the returned Result object has a Confidence value of -1.

Besides, all SemanticValue objects contained in the result also have a -1 confidence.

I cannot find a trace of the meaning of such a result in the related MSDN pages, and actually it is just written that typical confidence values should be between 0 and 1.

What does a -1 value mean ? Does it have something to do with the grammar?

Edit : Additional infos :

  • I am using the System.Speech classes to interact with the voice recognition engine.
  • The recognition engine is Microsoft English Recognizer v5.1.
  • I am running the program on XP and thus the Speech SDK is also 5.1.
  • The input is a microphone input: I did not find trace of the possibility to feed this recognition engine with a file, although it would have helped me a lot.
1
You might want to clarify some things. Are you using SAPI or System.Speech or Microsoft.Speech? What OS version are you running on? What recognizer version are you running? Are you using a shared or inproc recognizer? how was your grammar created? Are you using microphone input or wav file? I don't know what the problem is, but some more information may help you get some answers.Michael Levy

1 Answers

2
votes

In SAPI the SREngineConfidence is an attempt to pass the phrase confidence from the vendor specific speech engine to the engine independent SAPI client. SREngineConfidence has some interesting behavior described in "Microsoft Speech SDK Version 5.1 SR Engine Vendor Porting Guide"

http://msdn.microsoft.com/en-us/library/ee431799(v=VS.85).aspx#_Toc503606917 says:

It is possible for confidence score information to be included in recognition results. On each phrase element there are two confidence fields that the engine can set. These have both a Confidence (three-level) field and an SREngineConfidence (floating-point) field. If the engine does not explicitly set any of these values, SAPI will try and produce reasonable default values for them. It will produce the Confidence values by averaging the levels for each of the words in the phrase or property, and it will set the SREngineConfidence values to -1.0.

and later says:

If this field is not being used, the engine sets this confidence to -1.0.

One other resource that may give you some insight is http://gotspeech.net/forums/thread/3613.aspx. One post says:

In principle, the SREngineConfidence score is a value between 0.0 and 1.0 {higher value meaning higher confidence}. But older versions of the SR engines like 5.1 don't honor this contract precisely, and I don't think the value can really be used with those engines. Only the Hi, Medium, and Low scores in the other Confidence field are usable.

If I remember rightly, you need a more recent version of the SR engine, like the versions that ship with Microsoft Office 2003 or Vista to get a meaningful number in the SREngineConfidence field.

Edits:

I believe System.Speech.Recognition is really a .net wrapper around SAPI (see http://msdn.microsoft.com/en-us/magazine/cc163663.aspx). I suspect that the comments quoted above that describe confidence levels of -1 may still apply to you using System.Speech. I'm guessing that the -1 you are seeing is the same issue mentioned.

My understanding is that XP did not include a recognizer. Versions of Microsoft Office came with it. So, I'm not sure which recognizer engine you are really running. Do you have Office 2003 installed? or do you have a 3rd party engine like Dragon installed?

You say you have recognizer 5.1 installed. The GotSpeech.NET link above says:

But older versions of the SR engines like 5.1 don't honor this contract precisely, and I don't think the value can really be used with those engines.

I would suggest trying the following:

One more piece to add. Here is a short sample to recognize from a wav file:

    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();
    Grammar myGrammar = CreatePizzaGrammar();       // uses GrammarBuilder to create a pizza ordering grammar
    myRecognizer.LoadGrammar(myGrammar);
    myRecognizer.SetInputToWaveFile("LargeCheese.wav");     // recording of ordering a pizza
    RecognitionResult result = myRecognizer.Recognize();
    string s = result.Text;
    float confidence = result.Confidence;