5
votes

I'm trying to convert over an hour audio data to text using Google Cloud Speech API, and I'm using API explorer since it's easy.

The request looks like this.

POST https://speech.googleapis.com/v1/speech:longrunningrecognize?key={YOUR_API_KEY}
{
  "audio": {
    "uri": "gs://data/audio.flac"
  },
  "config": {
    "encoding": "FLAC",
    "languageCode": "en-US"
  }
}

The response look like this.

200 
Show headers 
{
  "name": "`numbers`"
}

How come it is only returning the name, and not returning the text of the audio?

1

1 Answers

8
votes

Just had the same problem.

Found the answer on https://cloud.google.com/speech/docs/async-recognize

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "name": "5543203840552489181"
}

where name is the name of the long running operation created for the request. Wait approximately 30 seconds for processing to complete. To retrieve the result of the operation, make a GET request:

GET https://speech.googleapis.com/v1/operations/YOUR_OPERATION_NAME?key=YOUR_API_KEY

Got my results with:

curl -s -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer {access_token}" \
    https://speech.googleapis.com/v1/operations/{name}