Google Cloud Speech-to-Text (MP3 to text)

Question

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.

const speech = require('@google-cloud/speech');

const config = {
  encoding: 'LINEAR16',  //AMR, AMR_WB, LINEAR16(for wav)
  sampleRateHertz: 16000,  //16000 giving blank result.
  languageCode: 'en-US'
};

Grokify Grokify · Accepted Answer · 2019-09-16T10:54:42

MP3 is now supported in beta:

MP3 Only available as beta. See RecognitionConfig reference for details.

https://cloud.google.com/speech-to-text/docs/encoding

MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding

You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:

https://en.wikipedia.org/wiki/44,100_Hz

To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:

RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8

https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1

Google Cloud Speech-to-Text (MP3 to text)

3 Answers