0
votes

I have a bot created with the Microsoft Bot Framework and that uses Skype as a channel. When the user tries to speak with the bot by sending an audio using one of the mobile apps (Android or iOS) I want to be able to get the audio from the attachments and send it to the Bing Speech API in order to convert it to text.

I'm having some issues doing this, the main problem I believe is the fact that I have to send a WAV to the Bing Speech API. I read the demo in the Bot Builder repository, and in the demo there's the following code:

var audioAttachment = activity.Attachments?.FirstOrDefault(a => a.ContentType.Equals("audio/wav"));
if (audioAttachment != null)
{
    using (var client = new HttpClient())
    {
        var stream = await client.GetStreamAsync(audioAttachment.ContentUrl);
        var text = await this.speechService.GetTextFromAudioAsync(stream);
        message = ProcessText(activity.Text, text);
    }
}

However when I send an audio through the Skype mobile app (I'm testing with Android) I don't have an "audio/wav" file type, the file type (ContentType) comes as just "audio".

When I try to get the audio file in the Bot State Manager API using Postman (the URL looks like this: https://smba.trafficmanager.net/apis/v3/attachments/0-eus-d1-0000000000000/views/original) I get something with the content type of "application/octet-stream", and I don't know if this is an MP3, or WAV, or whatever.

The just few lines I can see inside Postman are just something like this:

ftypmp42isommp42pmoovlmvhd�_ ��_ ���@ymeta!hdlrmdta+keysmdtacom.android.version%ilstdata7.1.1�trak\tkhd�_ ��_ ��@mdia mdhd�_ ��_ ��D��,hdlrsounSoundHandle�minfsmhd$dinfdrefurl �stbl[stsdKmp4a�D'esds@ww0stts��-�stsz

I download this content to a Stream using the ReadAsStreamAsync method and pass this string to the Bing Speech API, on the following endpoint:

https://speech.platform.bing.com/speech/recognition/interactive/cognitiveservices/v1?language=pt-BR&format=detailed

However this is what I get back:

{"RecognitionStatus":"InitialSilenceTimeout","Offset":11000000,"Duration":0}

In this case it's an audio with audible speech, and it doesn't detect the audio. As I said, I believe the problem is the file type. What is the file type used by Skype, and how can use this file to call the Bing Speech API?

2

2 Answers

1
votes

What is the file type used by Skype, and how can use this file to call the Bing Speech API?

You're right, the problem is the file type. The Bing Speech Api only supports WAV/PCM format currently, if your audio file is not with this format, you'll need try to convert it to PCM.

If you want to detect if the user attachment is an audio file, you can for example modify your code like this:

var audioAttachment = activity.Attachments?.FirstOrDefault(a => a.ContentType.Contains("audio"));

Then the real problem now is to convert it to a .wav audio. For C#, you may try use the NAudio package.

-1
votes

May be the below snippet may help for converting to wav format required for bing. This answer may be late for you but may be for future , it may help. I had a requirement for converting mp3 to wav file for use with bing speech (converting speech to text) and hence I had to write a small piece as below. Here we use stream to pipe input from ffmpeg to bing directly (so there is no need to have intermediate file system).

    const http = require('http'),
        fs = require('fs'),
        path = require('path');

    const ffmpeg = require('fluent-ffmpeg');
    const ffmpegPath = require('@ffmpeg-installer/ffmpeg').path;
    ffmpeg.setFfmpegPath(ffmpegPath);
    const stream = require('stream');

    var bing = require('bingspeech-api-client/lib/client');
    const bingSpeechkey = '';

    var bingClient = new bing.BingSpeechClient(bingSpeechkey);

    function bingUploadFromStream() {
        const pass = new stream.PassThrough();

        console.log('Bing upload');
        bingClient.recognizeStream(pass).then(response => console.log(response.results[0].name));

        return pass;
    }


    function speechToText(input) {
        ffmpeg(input)
            .format('wav')
            .on('progress', (progress) => {
                console.log('Processing: ' + progress.targetSize + ' KB converted');
            })
            .on('error', (err) => {
                console.log('An error occurred: ' + err.message);
            })
            .on('end', () => {
                console.log('Processing finished !');
            })
            .output(bingUploadFromStream())
            .run();
    }