I understand that Watson Speech To Text is somewhat calibrated for colloquial conversation and for 1 or 2 speakers. I also know that it can deal with FLAC better than WAV and OGG.
I would like to know how could I improve the algorithm recognition, acoustically speaking.
I mean, does increasing volume help? Maybe using some compression filter? Noise reduction?
What kind of pre processing could help for this service?