0
votes

I figured I'd set this one out with as much detail as possible, hopefully someone out there has some experience with this kind of set-up.

Front-end: ASP.Net MVC Razer website.

  • .Net Framework 4.6.1

Back-end: Bot-framework Web API (RESTful).

  • .Net Framework 4.6

Back-Back-end: I use various Azure located cognitive services but in this case it's just the Bing Speech API.

Relevant SDKs:

  • Microsoft.Bing.Speech (Version: 2.0.2)
    • Bond.Core.CSharp (Version: 8.0.0) ~ dependancy
    • Bond.CSharp (Version: 8.0.0) ~ dependancy
    • Bond.Runtime.CSharp (Version: 8.0.0) ~ dependancy

I'm using getUserMedia in the website to record the users microphone upon request from some javascript code, this creates a blob URL.

I then pass the blob url as the ContentUrl within an Attachment to an Activity.

When this hits the Bot-framework I do some basic validation (nothing related to this problem), and then pass to a custom Dialog<T>.

This is where I'm struggling to get the Bing Speech API to do what I want.

I use this method from within the Dialog<T>:

public async Task Run(string audioFile, string locale, Uri serviceUrl)
{
    // create the preferences object
    var preferences = new Preferences(locale, serviceUrl, new CognitiveServicesAuthorizationProvider(subscriptionKey));

    using (var speechClient = new SpeechClient(preferences))
    {
        speechClient.SubscribeToPartialResult(this.OnPartialResult);
        speechClient.SubscribeToRecognitionResult(this.OnRecognitionResult);

        using (WebClient webClient = new WebClient())
        {
            using (Stream stream = webClient.OpenRead(audioFile))
            {
                var deviceMetadata = new DeviceMetadata(DeviceType.Near, DeviceFamily.Desktop, NetworkType.Ethernet, OsName.Windows, "1607", "Dell", "T3600");
                var applicationMetadata = new ApplicationMetadata("SampleApp", "1.0.0");
                var requestMetadata = new RequestMetadata(Guid.NewGuid(), deviceMetadata, applicationMetadata, "SampleAppService");

                try
                {
                    await speechClient.RecognizeAsync(new SpeechInput(stream, requestMetadata), this.cts.Token).ConfigureAwait(false);
                }
                catch (Exception genEx)
                {
                    // Was just using this try/catch for debugging reasons
                }
            }
        }
    }
}

I'm using the WebClient to get the Stream, rather than the FileStream that this method uses in the Microsoft sample code because Filestream won't stream from URL's.

The Current Problems:

When this line is hit:

await speechClient.RecognizeAsync(new SpeechInput(stream, requestMetadata), this.cts.Token).ConfigureAwait(false);

It throws an error about the Bond.IO.dll

Fusion Log:

I'm debugging locally with the Microsoft Bot Framework Emulator which is why you'll see the local file paths.

=== Pre-bind state information ===
LOG: DisplayName = Bond.IO, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
 (Fully-specified)
LOG: Appbase = file:///[project folder]
LOG: Initial PrivatePath = \bin
Calling assembly : Microsoft.Bing.Speech, Version=2.0.2.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35.
===
LOG: This bind starts in default load context.
LOG: Using application configuration file:\web.config
LOG: Using host configuration file: \aspnet.config
LOG: Using machine configuration file from \machine.config.
LOG: Post-policy reference: Bond.IO, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
LOG: Attempting download of new URL file:///C:/Users/[USER]/AppData/Local/Temp/Temporary ASP.NET Files/vs/0f4bb63f/ca796715/Bond.IO.DLL.
LOG: Attempting download of new URL file:///C:/Users/[USER]/AppData/Local/Temp/Temporary ASP.NET Files/vs/0f4bb63f/ca796715/Bond.IO/Bond.IO.DLL.
LOG: Attempting download of new URL file:///C:/[USER]/[PROJECT PATH]/bin/Bond.IO.DLL.
WRN: Comparing the assembly name resulted in the mismatch: Major Version
ERR: Failed to complete setup of assembly (hr = 0x80131040). Probing terminated.

The weird thing is that if I roll back the bing api to 2.0.1 and manually insert the older versions of the Bond.IO packages (version 4.0.1) which is what's installed in the sample project, it doesn't throw this error, it throws other errors.

What I'm REALLY asking:

If I want to just send a .wav audio file to my API and then use the transcription function of the Bing.Speech API to convert the speech to text, what is the best way to do this? Am I at least going in the right direction.

Bonus Points if your answer ties in with how I'm already doing it.

2

2 Answers

3
votes

I'm using the WebClient to get the Stream, rather than the FileStream that this method uses in the Microsoft sample code because Filestream won't stream from URL's.

Not all Streams have the same capabilities. FileStream is a read/write random-access stream. NetworkStream is a forward-only, read-only stream.

So buffer the .wav to a MemoryStream before passing it to the API.

    using (Stream stream = webClient.OpenRead(audioFile))
    {

        var ms = new MemoryStream();
        stream.CopyTo(ms);
        ms.Position = 0;
        var deviceMetadata = new DeviceMetadata(DeviceType.Near, DeviceFamily.Desktop, NetworkType.Ethernet, OsName.Windows, "1607", "Dell", "T3600");
        var applicationMetadata = new ApplicationMetadata("SampleApp", "1.0.0");
        var requestMetadata = new RequestMetadata(Guid.NewGuid(), deviceMetadata, applicationMetadata, "SampleAppService");

        try
        {
            await speechClient.RecognizeAsync(new SpeechInput(ms, requestMetadata), this.cts.Token).ConfigureAwait(false);
        }
        catch (Exception genEx)
        {
            // Was just using this try/catch for debugging reasons
        }
    }
0
votes

Although the answer from David was definitely a good catch (as I was most certainly mixing up streams) the actual answer to the problem listed above is, annoyingly, one of limited support for the Microsoft.Bing.Speech api.

The people working on Bond.IO project on github introduced a breaking change between lower versions and the two latest versions that are currently listed on nuget (7.0.1 and 8.0.0).

This was an intentional breaking change between 5.x and 6.x to enable people outside of Microsoft to build and use strong-named signed Bond assemblies.


Breaking change Bond assemblies are now strong-name signed with the bond.snk key in > the repository instead of with a Microsoft key. This allows anyone to produce compatible > assemblies, not just Microsoft. Official distribution of Bond will continue to be > > Authenticode signed with a Microsoft certificate. Issue #414


The new public key for assemblies is now [Truncated a public key example]

Breaking change Bond assemblies now have assembly and file versions that correspond to their NuGet package version. Strong name identities will now change release-over-release in line with the NuGet package versions. Issue #325 1

This seemed to mean that upgrading the Microsoft.Bing.Speech api to it's latest versions 2.0.1 and 2.0.2 (bear in mind these are the only two available on nuget) could only install Bond.IO 7.0.1 or above. However they still contained an internal requirement on version 1.0.0.0 of Bond.IO (or more explicitly any build before 7.0.1).

It's also worth highlighting that if you manually install the packages from the microsoft sample project which target older versions of both the Microsoft.Bing.Speech assembly and the Bond.IO version 4.2.1 assembly the above code works without issue.2

There are also comments on one of the Microsoft Docs page by one of the contributors that the Microsoft.Bind.Speech assembly is on it's way to being depreciated (would have been nice if they had marked it as such, am I right.)3

To conclude, the closest answer to my problem above is that unless you want to use outdated assemblies with no on-going support, then don't bother using the Microsoft.Bing.Speech nuget package. They recommend using the Speech SDK instead (although be prepared for an up-hill battle if using this in a BotFramework WebAPI as it also has a couple of internal errors of its own)4.

I've spent the last few days working on this so I'm pretty confident that this is the current state of that library.


1Please see this issue against the Bond.IO Github

2Comment on a similar question supporting this.

3Look under closed comments at the bottom of this page, the response by 'Zhouwangzw' suggests using the latest Speech SDK.

3Found the GitHub issue that linked to the docs here

4Current breaking error in a webAPI using the Speech SDK.