6
votes

I am currently testing the SpeechRecognitionEngine by loading from an xml file a pretty simple rule. In fact it is a simple between ("decrypt the email", "remove encryption") or ("encrypt the email", "add encryption").

I have trained my Windows 7 PC and additionally added the words encrypt and decrypt as I realize they are very similar. The recognizer already has a problem with making a difference between these two.

The issue I am having is that it recognizes things too often. I have set the confidence to 0.93 because with my voice in a quiet room when saying the exact words sometimes only gets to 0.93. But then if I turn on the radio the voice of the announcer or a song can mean that this recognizer thinks it has heard with over 0.93 confidence with words "decrpyt the email".

Maybe Lady Gaga is backmasking Applause to secretly decrypt emails :-)

Can anyone help in working out how to do something to make this recognizer workable.

In fact the recognizer is also picking up keyboard noise as "decrypt the email". I don't understand how this is possible.

Further to my editing buddy there are at least two managed namespaces for MS Speech Microsoft.Speech and System.Speech - It is important for this question that it be know that it is System.Speech.

1
This is all rather normal. You didn't say anything about the microphone you used, it can be criticalHans Passant
I am using the mic from the Polycom cx100 polycom.com/products-services/products-for-microsoft/…. I trained the desktop engine and also did dictation on notepad of the words and my accuracy improved, but now it recognizes text when I am just typing.darbid
Switch to a headset microphone. Speakerphones are notorious for picking up extraneous noise.Eric Brown
ok noted. This is a cool device but I realize that whilst hands free is good for talking on the phone or communicator it might not be so good for speech recognition.darbid
@darbid - One of the fun things about SR is that engine confidence != accuracy. I.e., the engine can be very confident about a reco, but it will still be wrong. Conversely, the engine can have very low confidence in a reco, and it will still be correct. In practice, I never use the confidence values (aside from it being high enough to pass the rejection threshold).Eric Brown

1 Answers

13
votes

If the only thing the System.Speech recognizer is listening for is "encrypt the email", then the recognizer will generate lots of false positives. (Particularly in a noisy environment.) If you add a DictationGrammar (particularly a pronunciation grammar) in parallel, the DictationGrammar will pick up the noise, and you can check the (e.g.) name of the grammar in the event handler to discard the bogus recognitions.

A (subset) example:

    static void Main(string[] args)
    {
        Choices gb = new Choices();
        gb.Add("encrypt the document");
        gb.Add("decrypt the document");
        Grammar commands = new Grammar(gb);
        commands.Name = "commands";
        DictationGrammar dg = new DictationGrammar("grammar:dictation#pronunciation");
        dg.Name = "Random";
        using (SpeechRecognitionEngine recoEngine = new SpeechRecognitionEngine(new CultureInfo("en-US")))
        {
        recoEngine.SetInputToDefaultAudioDevice();
        recoEngine.LoadGrammar(commands);
        recoEngine.LoadGrammar(dg);
        recoEngine.RecognizeCompleted += recoEngine_RecognizeCompleted;
        recoEngine.RecognizeAsync();

        System.Console.ReadKey(true);
        recoEngine.RecognizeAsyncStop();
        }
    }

    static void recoEngine_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e)
    {
        if (e.Result.Grammar.Name != "Random")
        {
            System.Console.WriteLine(e.Result.Text);
        }
    }