0
votes

I am trying to do real time speech recognition for more than 1 min using Cloud Speech API but the limit of synchronous speech recognition is just 1 min per request. I have tried running https://github.com/GoogleCloudPlatform/python-docs-samples/blob/speech-continuous/speech/cloud-client/transcribe_streaming_indefinite.py as suggested in answer from this question Multiple StreamingRecognizeRequest but I got this error:

File "transcribe_streaming_indefinite.py", line 30, in __init__ 
self._bytes_per_sample = 2 * self._num_channels AttributeError: 
'ResumableMicrophoneStream' object has no attribute '_num_channels''

Does anyone have idea of how to do this? Thank you in advance.

1

1 Answers

0
votes

Note: The code that you are running belongs to a Github branch that is not the master, so it might not be updated.


There are three Speech-to-Text main methods to perform speech recognition:

Synchronous Recognition: Sends audio data to the Speech-to-Text API, performs recognition on that data, and returns results after all audio has been processed. Requests are limited to audio data of 1 minute or less in duration.

Asynchronous Recognition: Sends audio data to the Speech-to-Text API and initiates a Long Running Operation. Use asynchronous requests for audio data of any duration up to 180 minutes.

Streaming Recognition: Performs recognition on audio data provided within a gRPC bi-directional stream. Streaming requests are designed for real-time recognition purposes, such as capturing live audio from a microphone. Streaming recognition provides interim results while audio is being captured, allowing result to appear, for example, while a user is still speaking.

If you are trying to do real time speech recognition your best option is Streaming Recognition. Here’s a demo file you can try.

If you want to test the other two methods there are other demos in the repository for them.

Regarding audio files with more than 1 minute in duration. Synchronous recognition can only process audio files shorter than one minute. Asynchronous recognition can process audio files up to 180 minutes of duration but you will have to provide them from GCS. For Streaming Recognition if you want to process audio files with more than one minute duration you need to do it in batches.

Google provides some audio samples for testing purposes, they’re stored in the cloud-samples-test bucket. You can display a list of them with the following command:

gsutil ls gs://cloud-samples-tests/speech