3
votes

I’m using the classes in System.Speech.Recognition to develop an application that uses a very small grammar, which consists of only a few sentences. The user says one of these sentences, and the app should identify which one the user has said. However, if the user says something different, which is not one of these sentences, the app should identify nothing.

While experimenting with the SpeechRecognitionEngine class, I noticed a problem: when the user says only the beginning of the sentence, and then continues with some other words, the recognition engine identifies it as one of the predefined sentences. For example, lets say the grammar has only two sentences:

  1. “The dog eats its food”.
  2. “The cat sits on the sofa”.

If the user says “The dog is sleeping”, the recognition engine identifies it as “The dog eats its food”. I want the engine to recognize that this is not one of the above two sentences. To recognize “nothing”.

I’ve tried to add a DictationGrammar, as suggested here. However, after this, the app had problems to identify the predefined sentences. The user says “The dog eats its food”, but the recognition engine identifies something else, like “The dog is rude”.

This last thing doesn’t surprise me, since when I use the speech recognition software that comes with windows (which of course, uses System.Speech), I get very poor recognition results when dictating, even after I have trained it (I use win 7).

Any suggestions?

Update:

As NineBerry pointed out, checking the Confidence level of the result (RecognitionResult.Confidence) is very helpful. When the user say the predefined sentence "The dog eats its food", I get a confidence level which is higher than when he says "The dog is sleeping" (~0.9 vs ~0.7, respectively).

However, if only the last word is wrong, as in "The dog eats its leg", I get the same level of confidence as of the predefined "The dog eats its food". So I still have a problem.

1
Have you looked at the "Confidence" property of the Speech Recognition Result to see whether you can use a threshold on that to determine a hit?NineBerry
@NineBerry - You gave me an excellent advice. See my update.Bohoo

1 Answers

1
votes

If you want to verify presence of keyword in a speech, speech recognition is not really a good solution because it can not filter other speech reliably. It is very hard to recognize small grammar in presence of other speech. There is specific keyword spotting algorithms which designed with purpose in mind. Such algorithms allow you to configure a threshold for keyword to balance between false alarms and misdetections.

See for example the corresponding part of CMUSphinx documentation.

An example of keyword spotting algorithm is 'Ok Google' keyphrase that Google uses on Android. Please note that it's a static keyphrase and not a grammar just because even Google can not implement grammar spotting reliably.

Once keyword is recognized you can switch to grammar recognition and peform user task.