0
votes

First off I must admit that I'm fairly new to C#. I am using C# speech recognition for writing an application that will interface between a human and a robot. An example of a dialogue is as follows:

Human: Ok Robot, Drill.

Robot: Where?

Human shows where.

Robot: I am ready to drill.

Human: Ok Robot, start.

My approach is having two speech recognizes. The first one is for higher level commands such as "Drill", "Cut Square", "Cut Rectangle", "Play Game" etc. The second one is for start/stop commands for each of the higher level tasks.

This is my current code:

using System.IO;
using System.Speech.Recognition;
using System.Speech.Synthesis;

namespace SpeechTest
{
    class RobotSpeech
    {
        public SpeechRecognitionEngine MainRec = new SpeechRecognitionEngine();
        public SpeechRecognitionEngine SideRec = new SpeechRecognitionEngine();
        public SpeechSynthesizer Synth = new SpeechSynthesizer();
        private Grammar mainGrammar;
        private Grammar sideGrammar;
        private const string MainCorpusFile = @"../../../MainRobotCommands.txt";
        private const string SideCorpusFile = @"../../../SideRobotCommands.txt";

        public RobotSpeech()
        {
            Synth.SelectVoice("Microsoft Server Speech Text to Speech Voice (en-US, ZiraPro)");
            MainRec.SetInputToDefaultAudioDevice();
            SideRec.SetInputToDefaultAudioDevice();
            BuildGrammar('M');
            BuildGrammar('S');
        }

        private void BuildGrammar(char w)
        {
            var gBuilder = new GrammarBuilder("Ok Robot");
            switch (w)
            {
                case 'M':
                    gBuilder.Append(new Choices(File.ReadAllLines(MainCorpusFile)));
                    mainGrammar = new Grammar(gBuilder) { Name = "Main Robot Speech Recognizer" };
                    break;
                case 'S':
                    gBuilder.Append(new Choices(File.ReadAllLines(SideCorpusFile)));
                    sideGrammar = new Grammar(gBuilder) { Name = "Side Robot Speech Recognizer" };
                    break;
            }
        }

        public void Say(string msg)
        {
            Synth.Speak(msg);
        }

        public void MainSpeechOn()
        {
            Say("Speech recognition enabled");
            MainRec.LoadGrammarAsync(mainGrammar);
            MainRec.RecognizeAsync(RecognizeMode.Multiple);
        }

        public void SideSpeechOn()
        {
            SideRec.LoadGrammarAsync(sideGrammar);
            SideRec.RecognizeAsync();
        }

        public void MainSpeechOff()
        {
            Say("Speech recognition disabled");
            MainRec.UnloadAllGrammars();
            MainRec.RecognizeAsyncStop();
        }

        public void SideSpeechOff()
        {
            SideRec.UnloadAllGrammars();
            SideRec.RecognizeAsyncStop();
        }
    }
}

In my main program I have the speech recognized event as follows:

private RobotSpeech voiceIntr;
voiceIntr.MainRec.SpeechRecognized += MainSpeechRecognized;
private void MainSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            if (!e.Result.Text.Contains("Ok Bishop")) return;
            switch (e.Result.Text.Substring(10))
            {
                case "Go Home":
                    voiceIntr.Say("Going to home position.");
                    UR10MoveRobot.GoHome();
                    break;
                case "Say Hello":
                    voiceIntr.Say("Hello. My name is Bishop.");
                    break;
                case "Drill":
                    voiceIntr.Say("Show me where you want me to drill.");
                    // Actual code will be for observating the gestured pts and 
                    // returning the number of pts observed
                    var msg = "I am ready to drill those " + new Random().Next(2, 5) + " holes.";
                    voiceIntr.Say(msg);
                    voiceIntr.SideSpeechOn();
                    voiceIntr.SideSpeechOff();
                    break;
                case "Cut Outlet":
                    voiceIntr.Say("Show me where you want me to cut the outlet.");
                    // Launch gesture recognition to get point for cutting outlet
                    break;
                case "Stop Program":
                    voiceIntr.Say("Exiting Application");
                    Thread.Sleep(2200);
                    Application.Exit();
                    break;
            }
        }

The problem I am having is, when one of the MainRec events gets triggered, I am in one the cases here. Now I only want to listen for "Ok Robot Start" and nothing else which is given by the SideRec. If I subscribe to that event here this will go to another eventhandler with a switch case there from which I wouldn't know how to get back to the main thread.

Also after telling the human that the robot is ready for drilling, I would like it to block until it receives an answer from the user for which I need to use a synchronous speech recognizer. However, after a particular task I want to switch off the recognizer which I can't do if its synchronous.

Here are the files for the grammers:

MainRobotCommands.txt

Go Home

Say Hello

Stop Program

Drill

Start Drilling

Cut Outlet

Cut Shap

Play Tic-Tac-Toe

Ready To Play

You First

SideRobotCommands.txt:

Start

Stop

The speech recognition is only a part of a bigger application hence it has to be async unless I want to make it preciously block. I am sure there is better way to design this code, but I'm not sure my knowledge of C# is enough for that. Any help is greatly appreciated!

Thanks.

1
Cn you elaborate a bit why you want 2 recognizers? Or why not do all of it with just one?gkapellmann

1 Answers

0
votes

My approach is having two speech recognizes.

There is no need for 2 recognizers, you can have just 1 recognizer and load/unload grammars when you need them.

private void BuildGrammar(char w)

This is not a straightforward programming style to use switch statement and invoke same function two times with different arguments. Just create two grammars sequentially.

The problem I am having is, when one of the MainRec events gets triggered, I am in one the cases here. Now I only want to listen for "Ok Robot Start" and nothing else which is given by the SideRec. If I subscribe to that event here this will go to another eventhandler with a switch case there from which I wouldn't know how to get back to the main thread.

If you have one recognizer it's enough to have a single handler and do all work there.

Also after telling the human that the robot is ready for drilling, I would like it to block until it receives an answer from the user for which I need to use a synchronous speech recognizer. However, after a particular task I want to switch off the recognizer which I can't do if its synchronous.

It not easy to mix async and sync style in the same design. You need to use either sync programming or async programming. For event-based software there is actually no need to mix, you can work in strictly async paradigm without waiting. Just start new drilling action inside MainSpeechRecognized event handler when "ok drill" is recognized. You can also synthesize audio in async mode without waiting for the result and continue processing in on SayCompleted handler.

To track the state of your software you can create a state variable and check it in event handlers to understand in what state you are and choose next action.

This programming paradigm is called "Event-driven programming", you can read a lot about it in network, check Wikipedia page and start with this tutorial.