0
votes

I am trying to transcript a WAV audio file using Google Speech to Text API. Most of the text extraction work except one WAV file which i always hit

Unhandled error { Error: 3 INVALID_ARGUMENT: WAV header indicates an unsupported format.

I have referred https://cloud.google.com/speech-to-text/docs/encoding

Note: Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.

and tried both codec yet it still failed.

I tried to get detail of the wav via soxi command

>> soxi org\ hearing.WAV
Input File     : 'org hearing.WAV'
Channels       : 1
Sample Rate    : 22050
Precision      : 13-bit
Duration       : 00:14:59.99 = 19844721 samples ~ 67499.1 CDDA sectors
File Size      : 9.99M
Bit Rate       : 88.8k
Sample Encoding: 4-bit IMA ADPCM

May i know if the encoding format supported, "4-bit IMA ADPCM"? Or what's the correspondent codec from supported format? https://cloud.google.com/speech-to-text/docs/encoding#audio-encodings

If it's really not a supported codec from source file, anyway to convert to FLAC/WAV supported codec using some GCP function, then extract the text without user's manual conversion? Coz i am dealing with admin worker which need a dummy-friendly extraction function.

1

1 Answers

0
votes

You need to use enums.RecognitionConfig.AudioEncoding.LINEAR16 this works perfect for wav extensions. I can see that your sample rate is 22050 which wont work you need to make the sample rate to 16000. Also i faced issue when my bit depth was not set to 16 so please set your bit depth to 16, with those changes it will work. If you are using sox it can be done quite easily. Below is the sox command to change the sample rate and bit depth.

sox audio.wav -r 16000 -c 1 -b 16 audio_1.wav bandreject 200 3k

I have implemented band pass filter using bandreject you can remove this part if not required.