4
votes

I have used google cloud speech-to-text API to convert audio to text.

For .raw files it works fine

But for .wav files it gives me error something like:

Google::Gax::RetryError Exception: GaxError Exception occurred in retry method that was not classified as transient, caused by 3:Must use single channel (mono) audio, but WAV header indicates 2 channels.

I am using the ruby implementation of Speech-to-text API.

The test.wav file I already saved in assets.

I have used MULAW as encoding and sampling rate is omitted.

Can someone help me with this ?

1

1 Answers

4
votes

There's a Beta Feature you can use to specify the number of audio channels for transcribing audio with multiple channels [1].

If you're doing POST Requests, then you can specify:

'audioChannelCount': 2,

'enableSeparateRecognitionPerChannel': true

The second one would be specially useful when you have different persons being recorded on a separate channel (such as phone calls). You can read more about this in the documentation below, it also provides the code you would use if you're using the Java or the Python Client Libraries.

If you want to use one channel or an audio per channel, I would suggest to check the sox tool [2], which will allow you to programatically combine the channels into 1 or use remix to get an audio per channel.


[1] https://cloud.google.com/speech-to-text/docs/multi-channel

[2] http://sox.sourceforge.net/sox.html