I'm looking at using Google Cloud Speech to convert long-form narrated audio files and I need to know the start time of each phrase in the audio file. Is there a way to do this with Google Cloud Speech?
I'm currently working with the transcribe_async.py
.
Thanks.
1
votes
2 Answers
1
votes
This is not possible with Google Cloud Speech. If that information is important to you, you may need to look at other ASR systems. I know that offline, non-hosted ASR systems like Kaldi and CMU Sphinx will give you this information. I don't know if or which hosted ASR systems can provide that information.
1
votes
You can get (aproximated) start and end times (from the beginning of the audio track) for each word by setting to True the enableWordTimeOffsets option: https://cloud.google.com/speech/docs/async-time-offsets.
Beware that the start time of the first word of the transcript is always 0 and that, as far as I know, each word start time correspond to the previous word end time (also if there are pauses).