I am using pocketsphinx for speech recognition with a Spanish acoustic model and a JSGF grammar, with decent results so far.
However, I'm getting erroneous recognition results with audios that, at least to my ear, seem perfectly intelligible (not so much background noise, sampling frequency and bit depth according to acoustic model parameters, etc).
Also this audios that are not correctly recognized, do not seem to differ a great deal from the ones that are correctly recognized (in fact they sound pretty much the same to me).
So, I'm guessing there is something in the audio that makes it more difficult to recognize, perhaps some noise frequencies or other stuff that need to be filtered? (background noise, "pop" sounds of speech, frequencies outside the band of the human voice, etc)
In short, do you know if pocketsphinx already does something of this, and if not, do you know any best-practice filter/transformation/etc to be applied to an audio file in order to improve speech recognition results?
Thanks!