2
votes

I am looking for a speech recognition API that returns interim results as the user is speaking, similarly to what Google does on its homepage (https://www.google.com). I am looking for an API that supports French. What I want to do is to create a web application that works similarly to Google vocal search.

  • Google Speech API is not recommended for professional development, since it changes often and is not completely documented.
  • IBM Watson doesn't support French
  • AT&T Speech API doesn't return interim results
  • CMU Sphinx returns incredibly bad results (see a demo here: http://syl22-00.github.io/pocketsphinx.js/live-demo.html)
  • Nuance products don't seem to be made for a web application. (if you know what should I do to use them, I am interested!)
2

2 Answers

2
votes

Microsoft's Project Oxford Speech Recognition API, used by Cortana and Skype Translator, meets both of your criteria: it supports French (and 6 other languages) and returns partial/interim/online hypotheses as you stream audio to it.

(As an aside, the usual problem that causes terrible accuracy when doing online recognition with Pocketsphinx is bad CMN (cepstral mean normalization). When you give pocketsphinx a complete piece of audio to process it computes the CMN over the entire utterance, but when you stream audio to it it does not by default compute the CMN. One solution is to give it a complete utterance, retrieve the CMN computed by pocketsphinx, then use that CMN for the streaming audio. Note that CMN is different for each audio channel/environment, and that the Python interface to pocketsphinx doesn't offer an interface to CMN data. I have a patch if this is a route you'd like to investigate.)

1
votes

Many voice to text applications use the speech recognition technology developed by Nuance Communications. The SDK that would work well with a web application is their Server SDK which supports the conversion of streaming audio into text. It supports French in addition to English and German. To use this, you would likely need to stream the audio input via an AJAX request to the server where it would be processed, then accept the text as the XMLHTTPResponse from your AJAX request.