I have a bot created with the Microsoft Bot Framework and that uses Skype as a channel. When the user tries to speak with the bot by sending an audio using one of the mobile apps (Android or iOS) I want to be able to get the audio from the attachments and send it to the Bing Speech API in order to convert it to text.
I'm having some issues doing this, the main problem I believe is the fact that I have to send a WAV to the Bing Speech API. I read the demo in the Bot Builder repository, and in the demo there's the following code:
var audioAttachment = activity.Attachments?.FirstOrDefault(a => a.ContentType.Equals("audio/wav"));
if (audioAttachment != null)
{
using (var client = new HttpClient())
{
var stream = await client.GetStreamAsync(audioAttachment.ContentUrl);
var text = await this.speechService.GetTextFromAudioAsync(stream);
message = ProcessText(activity.Text, text);
}
}
However when I send an audio through the Skype mobile app (I'm testing with Android) I don't have an "audio/wav" file type, the file type (ContentType) comes as just "audio".
When I try to get the audio file in the Bot State Manager API using Postman (the URL looks like this: https://smba.trafficmanager.net/apis/v3/attachments/0-eus-d1-0000000000000/views/original) I get something with the content type of "application/octet-stream
", and I don't know if this is an MP3, or WAV, or whatever.
The just few lines I can see inside Postman are just something like this:
ftypmp42isommp42pmoovlmvhd�_ ��_ ���@ymeta!hdlrmdta+keysmdtacom.android.version%ilstdata7.1.1�trak\tkhd�_ ��_ ��@mdia mdhd�_ ��_ ��D��,hdlrsounSoundHandle�minfsmhd$dinfdrefurl �stbl[stsdKmp4a�D'esds@ww0stts��-�stsz
I download this content to a Stream using the ReadAsStreamAsync
method and pass this string to the Bing Speech API, on the following endpoint:
However this is what I get back:
{"RecognitionStatus":"InitialSilenceTimeout","Offset":11000000,"Duration":0}
In this case it's an audio with audible speech, and it doesn't detect the audio. As I said, I believe the problem is the file type. What is the file type used by Skype, and how can use this file to call the Bing Speech API?