4
votes

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.

const speech = require('@google-cloud/speech');

const config = {
  encoding: 'LINEAR16',  //AMR, AMR_WB, LINEAR16(for wav)
  sampleRateHertz: 16000,  //16000 giving blank result.
  languageCode: 'en-US'
};
3

3 Answers

5
votes

MP3 is now supported in beta:

MP3 Only available as beta. See RecognitionConfig reference for details.

MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.

You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:

To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:

RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
3
votes

According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),

Only the following formats are supported:

  • FLAC
  • LINEAR16
  • MULAW
  • AMR
  • AMR_WB
  • OGG_OPUS
  • SPEEX_WITH_HEADER_BYTE

Anything else will be rejected.

Your best bet is to convert the MP3 file to either:

Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.

2
votes

I had the same issue and resolved it by converting it to FLAC.

Try converting your audio to FLAC and use

encoding: 'FLAC',

For conversion, you can use sox ref: https://www.npmjs.com/package/sox