Is it possible to send real-time data to Bing Speech Recognition?

Question

I am writing an application which should receive audio and send it to Bing Recognition API to get text. I used the Service Library and it works with a wav file. So I wrote my own stream class to receive audio from mic or network (RTP) as send it to the recognition API. When I add a WAV header in front of the audio stream, it works for some seconds.

Debugging shows, that the recognition api reads form stream faster than it is filled by audio source (16k samplerate, 16 bit, mono).

So my question is: Is there a way to use the recognize api with a real-time (continuous) audio stream?

I know there is an example with a microphone client, but it works with microphone only and I need it for different sources.

Do you just want to to send audio in realtime and get back results as someone speaks? Or do you want to send an arbitrarily long stream of audio? Maybe if you link to the microphone example your question will be clearer. — John Wiseman
I want to send audio in realtime to get partial results during speaking. Principially like the microphone sample in the sample folder but for different sources (e.g. RTP). But I hope I found a solution (have to do some more tests). If it works I will create an answer with the description. — H.G. Sandhagen

John Wiseman John Wiseman · Accepted Answer · 2016-12-20T18:57:18

If you want to use sources other than a microphone, you can use a DataRecognitionClient class, by calling SpeechRecognitionServiceFactory's CreateDataClient method. Once you have the client object, you can take audio from any source--microphone, network, reading from a file, etc.--and send it to be processed with the client's SendAudio method. As you receive each audio buffer, you make a new call to SendAudio.

While you're in the process of sending audio with SendAudio, you will receive partial recognition results in realtime (or close) in the form of the client's OnPartialResponse event.

When you're done sending audio, you signal to the client that you're ready for the final recognition result by calling EndAudio. You should then receive a OnResponseReceived event from the client containing the final recognition hypotheses.

Is it possible to send real-time data to Bing Speech Recognition?

3 Answers