I am developing a prototype speech to text captioning application for a University project. I am going to be using gesture recognition within my project late on, so I thought it would be a good idea to use the Kinect as the microphone source, rather than using an additional microphone. The idea of my application is to recognize spontaneous speeches such as long and complex sentences (I understand it won’t that the speech dictation will not be perfect however). I have seen many Kinect speech samples where it makes a reference to Microsoft.Speech, but not System.Speech. As I need to train the speech engine and load a DictationGrammar into the Speech Recognition Engine, Microsoft.Speech is the only option for me.
I have managed to get it working while using the Kinect as the direct microphone audio source, but since I am loading the Kinect for the video preview and gesture recognition, I am unable to access it as a direct microphone.
This is code accessing the microphone directly without loading the Kinect hardware for gesture, etc, and works perfectly:
private void InitializeSpeech()
{
var speechRecognitionEngine = new SpeechRecognitionEngine();
speechRecognitionEngine.SetInputToDefaultAudioDevice();
speechRecognitionEngine.LoadGrammar(new DictationGrammar());
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
}
And this is where I need to access the access source via the Kinect once it has been loaded, which isn't doing anything at all. This I want to be doing:
using (var audioSource = new KinectAudioSource())
{
audioSource.FeatureMode = true;
audioSource.AutomaticGainControl = false;
audioSource.SystemMode = SystemMode.OptibeamArrayOnly;
var recognizerInfo = GetKinectRecognizer();
var speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);
speechRecognitionEngine.LoadGrammar(new DictationGrammar());
speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
using (var s = audioSource.Start())
{
speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
}
}
So the question is, is it even possible to use System.Speech instead of Microsoft.Speech with the current Kinect SDK, and what am I doing wrong in the 2nd code sample?
GetKinectRecognizer Method
private static RecognizerInfo GetKinectRecognizer()
{
Func<RecognizerInfo, bool> matchingFunc = r =>
{
string value;
r.AdditionalInfo.TryGetValue("Kinect", out value);
return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
};
return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
}