30
votes

I am developing a prototype speech to text captioning application for a University project. I am going to be using gesture recognition within my project late on, so I thought it would be a good idea to use the Kinect as the microphone source, rather than using an additional microphone. The idea of my application is to recognize spontaneous speeches such as long and complex sentences (I understand it won’t that the speech dictation will not be perfect however). I have seen many Kinect speech samples where it makes a reference to Microsoft.Speech, but not System.Speech. As I need to train the speech engine and load a DictationGrammar into the Speech Recognition Engine, Microsoft.Speech is the only option for me.

I have managed to get it working while using the Kinect as the direct microphone audio source, but since I am loading the Kinect for the video preview and gesture recognition, I am unable to access it as a direct microphone.

This is code accessing the microphone directly without loading the Kinect hardware for gesture, etc, and works perfectly:

private void InitializeSpeech()
{
    var speechRecognitionEngine = new SpeechRecognitionEngine();
    speechRecognitionEngine.SetInputToDefaultAudioDevice();
    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
}

And this is where I need to access the access source via the Kinect once it has been loaded, which isn't doing anything at all. This I want to be doing:

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    var recognizerInfo = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

So the question is, is it even possible to use System.Speech instead of Microsoft.Speech with the current Kinect SDK, and what am I doing wrong in the 2nd code sample?

GetKinectRecognizer Method

private static RecognizerInfo GetKinectRecognizer()
{
    Func<RecognizerInfo, bool> matchingFunc = r =>
    {
        string value;
        r.AdditionalInfo.TryGetValue("Kinect", out value);
        return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
    };

    return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
}
2
Windows recognises Kinect as a microphone input, so all speech libraries should be working fine. Have you been able to run the audio/speech samples provided with the Kinect SDK to verify the device is working OK? The above code looks fine to me, but could you post the GetKinectRecognizer method you are calling too?LewisBenge
Hi. Apologise for the late reply. Please refer to the edit above to see the GetKinectRecognizer method I am using, which is basically the one from the Kinect samples.Daniel Clark
@LewisBenge, did you see Dan Clark's reply?Liam Dawson

2 Answers

3
votes

From my own experimentation, I can tell you that you can in fact use both libraries simultaneously.

Try this code instead of your current code (make sure that you add a reference to System.Speech, obviously):

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

Good Luck!!!

0
votes

Try this code with a reference to System.Speech.

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}