I developed an application in C# that tries to recognize phrases in a pc-to-land phone call, using the Skype ActiveX and Microsoft's speech recognition engines, redirecting the call through a tcp/ip port.
In the phone, a machine plays a recording, so the voice is very clear. But neither System.Speech
nor Microsoft.Speech
can recognize anything usefull.
It has loaded the grammar with the expected choices, but nothing. If I speak, I have to say a word, wait until it recognizes the word, and then speak the next word, that way it works.
The question is, how can I improve this? or do you know an ASR that can do better on live conversations?