2
votes

When I try to invoke Google Cloud Speech to Text Api for long-running recognition with the following config:

config = dict(
            languageCode='de',
            maxAlternatives=1,
            enableWordTimeOffsets=True,
            enableAutomaticPunctuation=True,
            model='default',
            encoding='ENCODING_UNSPECIFIED'
          )

I get this error

Invalid JSON payload received. Unknown name "encoding" at 'config': Proto field is not repeating, cannot start list

How to fix it?

1
Hello Andrew. Would you please check my answer?ofundefined

1 Answers

1
votes

Could you please give us some more information... like which language and library version you are using for this part of your project?

Assuming you are using Python, you could find another official way for connecting to Google Cloud Speech to Text Api here: https://cloud.google.com/speech-to-text/docs/basics

The way I am used to do is by using googleapiclient phyton package alongside with JSON data structure instead of dictionary data.

import base64
import googleapiclient.discovery

with open(speech_file, 'rb') as speech:
    # Base64 encode the binary audio file for inclusion in the JSON
    # request.
    speech_content = base64.b64encode(speech.read())

# Construct the request
service = googleapiclient.discovery.build('speech', 'v1')
service_request = service.speech().recognize(
    body={
        "config": {
            "encoding": "LINEAR16",  # raw 16-bit signed LE samples
            "sampleRateHertz": 16000,  # 16 khz
            "languageCode": "en-US",  # a BCP-47 language tag
        },
        "audio": {
            "content": speech_content
            }
        })

Refer to this official article if you don't know how to install python packages: https://packaging.python.org/tutorials/installing-packages/#id13

For LongRunning requests, please refer to: https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/longrunningrecognize

The config JSON structure in this case will be:

{
  "config": {
    object(RecognitionConfig)
  },
  "audio": {
    object(RecognitionAudio)
  }
}

Where RecognitionConfig is a JSON object of the kind:

{
  "encoding": enum(AudioEncoding),
  "sampleRateHertz": number,
  "languageCode": string,
  "maxAlternatives": number,
  "profanityFilter": boolean,
  "speechContexts": [
    {
      object(SpeechContext)
    }
  ],
  "enableWordTimeOffsets": boolean
}

And RecognitionAudio is of the kind:

{
  // Union field audio_source can be only one of the following:
  "content": string,
  "uri": string
  // End of list of possible types for union field audio_source.
}

For LongRunning recognition, you may also refer to this link: https://developers.google.com/resources/api-libraries/documentation/speech/v1/java/latest/com/google/api/services/speech/v1/Speech.SpeechOperations.html

It shows how to use the Phyton package googleapiclient.discovery for long running requests, which is just by using the following method in your Phyton class:

...
service_request = service.speech().longrunningrecognize(
        body= {
            "config": {
                "encoding": "FLAC",
                "languageCode": "en-US",
                "enableWordTimeOffsets": True
            },
            "audio": {
                "uri": str('gs://speech-clips/'+self.audio_fqid)
            }
        }
    )
...