What role does bit rate play in the accuracy of Google Speech To Text transcription?

Question

I am helping a client convert a video file using ffmpeg and they originally used -b:a 64k while transcoding their video to audio at a sampling rate (-ar 44100 argument in ffmpeg) of 44100. Their objective is that they want to generate the most accurate transcriptions using the Google Cloud Speech To Text API.

While combing through their documentation I did not find anything on how bit rate impacts the accuracy of the transcription. So my question is thus - would using a higher bit rate such as 128k help me in getting better transcriptions or does it not matter?

Did you try both bitrates and notice a difference in the speech-to-text output? Are you working with mono or stereo files? What is the audio format you are providing? 64k for mono voice for MP3 and AAC should be fine. — llogan
I tried higher bitrates but did not notice a difference in the STT outputs. I am working with mono files. The audio for is PCM Linear 16. Should I use ffmpeg to convert to a higher bit rate? — Jash Shah
PCM has no bitrate parameter. The bitrate is fixed and depends on sample rate, bits per sample, and channel layout. See Wav audio file compression not working. — llogan

DavicC DavicC · Accepted Answer · 2020-11-11T06:05:33

Bitrate is used to describe the amount of data being transferred into audio. A higher bitrate generally means better audio quality. Higher bit rate contains more details in general sense, meaning it has better sound quality. Comparing it to photos, a high resolution picture is of better quality since in contains more details.

Google reference suggests to capture an audio with a sampling rate of 16,000Hz or higher for optimal results in using Google Speech-to-Text. Thus, a higher sampling rate or bit rate is preferred for optimal results since it is high quality.

If your are working on mono audio files, which is low quality in theory, and you converted it to a higher bit rate, this will not necessarily increase audio quality after its conversion. If the source audio file used to convert it to a higher bit rate, this will ideally yield to the same quality just increasing its bit rate. Thus, it is very important that you record your audio files using a higher bit rate in the first place.

What role does bit rate play in the accuracy of Google Speech To Text transcription?

1 Answers