I am trying to implement Speech-To-Text in my application using Google Cloud Speech-To-Text API with Python. I get the transcription correctly, however the response contains only the transcript and confidence, but not the separate words. If I try to access the words, I get an empty list.
For accessing the results, I use the following code:
best_alternative = result.alternatives[0]
word = best_alternative
transcript = best_alternative.transcript
confidence = best_alternative.confidence
print(f'Transcript: {transcript}')
print(f'Confidence: {confidence:.0%}')
Printing out best_alternative.__dict__ gives me transcript and confidence, but not words. Is there any particular way of accessing the words in transcript or am I missing something?
UPDATE: Initially, I was initializing the recognition configuration like this:
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=RATE,
language_code=lan_code)
streaming_config = speech.StreamingRecognitionConfig(
config=config,
interim_results=True,
enable_speaker_diarization=True)
With this configuration, the returned response didn't contain words, only transcript and confidence. Then I changed my configuration to this:
config = speech.RecognitionConfig()
config.sample_rate_hertz = 16000
config.language_code = 'en-US'
config.encoding = speech.RecognitionConfig.AudioEncoding.LINEAR16
config.enable_speaker_diarization = True
This eventually gave me words alongside with transcript and confidence. The words can be accessed using:
response.results[0].alternatives[0].words[i].word