Google Cloud Speech API returns nothing for audio longer than 1 minute

Question

Audio files shorter than 1 minute are transcribed without problem, but when I attempt to transcribe a longer file, the Google Speech API returns an empty response.

I make my .wav file using the following SoX command:

sox input.flac --channels=1 --bits=16 --rate=16000 --encoding=signed-integer --endian=little output.wav

The file plays as expected. Running SoXi, I get the following information:

Input File     : 'output.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:02:35.71 = 2491408 samples ~ 11678.5 CDDA sectors
File Size      : 4.98M
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

I then upload it to my Google Storage, because the documentation states that any file larger than 1 minute must reside in a gs bucket for the API to transcribe it.

I then run the following piece of code to begin the transcribing operation:

use \Google\Cloud\ServiceBuilder;

$cloud = new ServiceBuilder([
    'keyFilePath' => '/var/www/cert/gcloud_key.json',
    'projectId' => 'm****n-141000'
]);

$speech = $cloud->speech();

$operation = $speech->beginRecognizeOperation(
    "gs://m****n-141000.appspot.com/output.wav", [
    'encoding' => 'LINEAR16',
    'sampleRate' => 16000
]);

$isComplete = $operation->isComplete();

while (!$isComplete) {
    sleep(1);
    $operation->reload();
    $isComplete = $operation->isComplete();
}

var_dump($operation->results());

The response coming back is empty. The full response looks like this:

object(stdClass)#27 (4) {
  ["name"]=>
  string(19) "1904326252537199795"
  ["metadata"]=>
  object(stdClass)#24 (4) {
    ["@type"]=>
    string(70) "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata"
    ["progressPercent"]=>
    int(100)
    ["startTime"]=>
    string(27) "2017-01-02T09:36:45.780425Z"
    ["lastUpdateTime"]=>
    string(27) "2017-01-02T09:36:46.720260Z"
  }
  ["done"]=>
  bool(true)
  ["response"]=>
  object(stdClass)#26 (1) {
    ["@type"]=>
    string(70) "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
  }
}

Suggesting that the request ran and completed successfully, but without any actual response. Where am I going wrong?

Harri H. Harri H. · Accepted Answer · 2017-01-10T11:58:13

Speech API documentation (https://cloud.google.com/speech/docs/encoding) is saying that wav files are not supported. It should be raw file without any headers (with *.raw extension). The sox conversion should have "--type=FILETYPE" definition, but unfortunately I'm not sure if it is "--type=raw" or something else.

Google Cloud Speech API returns nothing for audio longer than 1 minute

3 Answers