1
votes

I developed an application in C# that tries to recognize phrases in a pc-to-land phone call, using the Skype ActiveX and Microsoft's speech recognition engines, redirecting the call through a tcp/ip port.

In the phone, a machine plays a recording, so the voice is very clear. But neither System.Speech nor Microsoft.Speech can recognize anything usefull.

It has loaded the grammar with the expected choices, but nothing. If I speak, I have to say a word, wait until it recognizes the word, and then speak the next word, that way it works.

The question is, how can I improve this? or do you know an ASR that can do better on live conversations?

1
Speech recognition is nearly impossible to solve. Apples Siri seems the only one so far that does an acceptable job under the best circumstances.MrFox
Yes, I know, I'm losing lots of time on dead end researchs. Now I'm thinking, due to that what I'm recognizing is recorded, maybe I could compare the audio stream against the parts of the original recording I want to catch. I'll try to do an aproximation, but any advice will be highly welcomed.Gabriel
The advice would be: stop wasting your time with this and do something where you might actually get some result. Unless you are very smart, have tons of time and know exactly what you are doing, in which case you would not ask here.MrFox
@MrFox I was expecting that coment, haha. I know what I'm doing, I asked here because what I don't have is, precisely, the time and I'm trying to reach a quick solution.Gabriel
I come back just to say I achieved it, and works really nice even with the tiny records like the spoken digits. Thanks for your condescendence.Gabriel

1 Answers

3
votes

The most straightforward way is to use tools specifically designed for the task instead of hand-made solution of skype/activex.

There are special software to connect telephone calls to something actionable. Some of them:

Asterisk

Freeswitch

All such system provide speech recognition and interactive voice response functionality through the MRCP protocol. The easiest way to setup the recognition is to use CMUSphinx toolkit

You can read more about CMUSphinx integration into IVR systems here or here

If you prefer to start very quickly, there are cool SAAS solutions which let you build a telephony applications with a few clicks. For example Voxeo