I figured I'd set this one out with as much detail as possible, hopefully someone out there has some experience with this kind of set-up.
Front-end: ASP.Net MVC Razer website.
- .Net Framework 4.6.1
Back-end: Bot-framework Web API (RESTful).
- .Net Framework 4.6
Back-Back-end: I use various Azure located cognitive services but in this case it's just the Bing Speech API.
Relevant SDKs:
- Microsoft.Bing.Speech (Version: 2.0.2)
- Bond.Core.CSharp (Version: 8.0.0) ~ dependancy
- Bond.CSharp (Version: 8.0.0) ~ dependancy
- Bond.Runtime.CSharp (Version: 8.0.0) ~ dependancy
I'm using getUserMedia
in the website to record the users microphone upon request from some javascript code, this creates a blob URL.
I then pass the blob url as the ContentUrl
within an Attachment
to an Activity
.
When this hits the Bot-framework I do some basic validation (nothing related to this problem), and then pass to a custom Dialog<T>
.
This is where I'm struggling to get the Bing Speech API to do what I want.
I use this method from within the Dialog<T>
:
public async Task Run(string audioFile, string locale, Uri serviceUrl)
{
// create the preferences object
var preferences = new Preferences(locale, serviceUrl, new CognitiveServicesAuthorizationProvider(subscriptionKey));
using (var speechClient = new SpeechClient(preferences))
{
speechClient.SubscribeToPartialResult(this.OnPartialResult);
speechClient.SubscribeToRecognitionResult(this.OnRecognitionResult);
using (WebClient webClient = new WebClient())
{
using (Stream stream = webClient.OpenRead(audioFile))
{
var deviceMetadata = new DeviceMetadata(DeviceType.Near, DeviceFamily.Desktop, NetworkType.Ethernet, OsName.Windows, "1607", "Dell", "T3600");
var applicationMetadata = new ApplicationMetadata("SampleApp", "1.0.0");
var requestMetadata = new RequestMetadata(Guid.NewGuid(), deviceMetadata, applicationMetadata, "SampleAppService");
try
{
await speechClient.RecognizeAsync(new SpeechInput(stream, requestMetadata), this.cts.Token).ConfigureAwait(false);
}
catch (Exception genEx)
{
// Was just using this try/catch for debugging reasons
}
}
}
}
}
I'm using the WebClient
to get the Stream, rather than the FileStream
that this method uses in the Microsoft sample code because Filestream
won't stream from URL's.
The Current Problems:
When this line is hit:
await speechClient.RecognizeAsync(new SpeechInput(stream, requestMetadata), this.cts.Token).ConfigureAwait(false);
It throws an error about the Bond.IO.dll
Fusion Log:
I'm debugging locally with the Microsoft Bot Framework Emulator
which is why you'll see the local file paths.
=== Pre-bind state information ===
LOG: DisplayName = Bond.IO, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
(Fully-specified)
LOG: Appbase = file:///[project folder]
LOG: Initial PrivatePath = \bin
Calling assembly : Microsoft.Bing.Speech, Version=2.0.2.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35.
===
LOG: This bind starts in default load context.
LOG: Using application configuration file:\web.config
LOG: Using host configuration file: \aspnet.config
LOG: Using machine configuration file from \machine.config.
LOG: Post-policy reference: Bond.IO, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35
LOG: Attempting download of new URL file:///C:/Users/[USER]/AppData/Local/Temp/Temporary ASP.NET Files/vs/0f4bb63f/ca796715/Bond.IO.DLL.
LOG: Attempting download of new URL file:///C:/Users/[USER]/AppData/Local/Temp/Temporary ASP.NET Files/vs/0f4bb63f/ca796715/Bond.IO/Bond.IO.DLL.
LOG: Attempting download of new URL file:///C:/[USER]/[PROJECT PATH]/bin/Bond.IO.DLL.
WRN: Comparing the assembly name resulted in the mismatch: Major Version
ERR: Failed to complete setup of assembly (hr = 0x80131040). Probing terminated.
The weird thing is that if I roll back the bing api to 2.0.1 and manually insert the older versions of the Bond.IO packages (version 4.0.1) which is what's installed in the sample project, it doesn't throw this error, it throws other errors.
What I'm REALLY asking:
If I want to just send a .wav audio file to my API and then use the transcription function of the Bing.Speech API to convert the speech to text, what is the best way to do this? Am I at least going in the right direction.
Bonus Points if your answer ties in with how I'm already doing it.