0
votes

I am trying to convert an audio file with the following attributes using Google Speech to Text API

1) Raw File 2) Sample Rate: 16000 3) Bit Rate: 16 4) Audio Type: mono

I am using the following Python Code to get the text file

service_request = service.speech().asyncrecognize(
        body={
            'config': {
                'encoding': 'LINEAR16',  # raw 16-bit signed LE samples
                'sampleRate': 16000,  # 16 khz
                'languageCode': 'en-US',  # a BCP-47 language tag
            },
            'audio': {
                'uri':'gs://xxxxxxxxx/english.raw'
                }
            })
    response = service_request.execute()
    print(json.dumps(response))

This logic works well, but for some reason the transcription only returns one minute worth of recording and ignores the rest.

Why is this happening, can someone help me out?

2

2 Answers

0
votes

It's difficult to tell from your code, but you must be submitting a Synchronous Request. According to the docs, length is limited to ~60 seconds. Asynchronous Requests accept up to approximately 80 minutes. Read through the APIs and Reference docs to learn how to properly structure your requests for the API you are using.

0
votes

My findings to this question are as follows

1) The Google Speech to Text API is built for recognizing short audio files. 2) The amount of "Audio Data" that it can process in a given time is limited. Push too much through and Google will ignore it 3) If you really want to get into this, one has to figure out how to pre-process of your audio file and "divide" the audio file into consumable chunks.