IBM Watson SpechtoTextV1 error - Python

Question

I have been trying my hands on IBM Watson speechtotext api. However, it works with short length audio files, but not with audio files which are around 5 mins. It's giving me below error "watson {'code_description': 'Bad Request', 'code': 400, 'error': 'No speech detected for 30s.'}"

I am using Watson's trial account. Is there a limitation in case of trial account? or bug in below code.

Python code:-

from watson_developer_cloud import SpeechToTextV1

speech_to_text = SpeechToTextV1(
    username='XXX', 
    password='XXX',
    x_watson_learning_opt_out=False
)

with open('trial.flac', 'rb') as audio_file:
    print(speech_to_text.recognize(audio_file, content_type='audio/flac', model='en-US_NarrowbandModel', timestamps=False, word_confidence=False, continuous=True))

Appreciate any help!

Is there understandable speech during the first 30 seconds of your audio file? A suggestion: cut the first 30 seconds and create a new, smaller file. Try with that. It might give a better idea and help debugging. — Emre Sevinç
Code seems to work with similar files(1 or 2 or 5 or 12 mins). If continuous=True is removed from the code, then it returns first 10 seconds conversion of above mentioned file. — Vaibhav

Joshua Alger Joshua Alger · Accepted Answer · 2017-05-23T18:40:03

Please see the implementation notes from the Speech to Text API Explorer for the recognize API you are attempting to use:

Implementation Notes

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe live audio as it becomes available or to transcribe multiple audio files with multipart requests, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

There are two factors here. First there is a data size limit of 100 MB, so I would make sure you do not send files larger then that to the Speech to Text service. Secondly, you can see the server will close the connection and return a 400 error if there is no speech detected for the amount of seconds defined for inactivity_timeout. It seems the default value is 30 seconds, so this matches the error you are seeing above.

I would suggest you make sure there is valid speech in the first 30 seconds of your file and/or make the inactivity_timeout parameter larger to see if the problem still exists. To make things easier, you can test the failing file and other sound files by using the API Explorer in a browser:

Speech to Text API Explorer

IBM Watson SpechtoTextV1 error - Python

2 Answers