File size for transcription in Google Cloud speech API

Question

Google Cloud Speech API expects you to use sync recognition mode for audio files which run less than a minute and use async recognition mode if audio files are more than a minute.

How do I analyse which interface to use for getting the transcript, should we do this based on the audio file size? Or how to find the total time of an audio durig transcription or how to handle this scenario?

Philippe Sultan Philippe Sultan · Accepted Answer · 2018-02-12T13:29:52

The audio file size is not relevant to determine the audio duration, as the audio file is comprised of a number of audio samples (this total number of samples depends on the number of audio samples taken per second, measured in Hertz), each of these samples having a given size (how many bits needed to encode a sample).

You may find the sox utility and its soxi program useful here to determine the duration of your audio file. soxi will parse the header of your audio file to give that info. Here is an example with a WAV file :

$ soxi audiofile.wav

Input File     : 'audiofile.wav'
Channels       : 2
Sample Rate    : 48000
Precision      : 16-bit
Duration       : 00:00:59.76 = 2868480 samples ~ 4482 CDDA sectors
File Size      : 11.5M
Bit Rate       : 1.54M
Sample Encoding: 16-bit Signed Integer PCM

Hoep this helps!

File size for transcription in Google Cloud speech API

1 Answers