2
votes

I'm trying to create an application to transcribe some wav files using cloud functions and cloud speech API. The official document shows how to do this ( https://cloud.google.com/speech-to-text/docs/async-recognize). However, cloud functions have processing time limit (up to 540 seconds), and some long wav files might exceed the time for waiting transcription API. I'm searching for a resuming way.

The official document shows the following code. (I'm using node for cloud functions)

// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const [response] = await operation.promise();

client.longRunningRecognize() sends a request and returns request information in a few seconds, and operation.promise() waits transcription API finishes. However, it may take more than 540 seconds for large files, and the process may be killed at this line. So somehow I want to resume processing using 'operation' object in another process. I tried serializing the 'operation' object to a file and loading it afterwards, but it can not include functions and operation.promise() is lost. How can I solve this problem?

2

2 Answers

1
votes

If your job is going to take more than 540 seconds, Cloud Functions is not really the best solution for this problem. Instead, you may want to consider using Cloud Functions as just a triggering mechanism, then offload the work to App Engine or Compute Engine using pubsub to send it the relevant data (e.g. the location of the file in Cloud Storage, and other metadata needed to make the request to recognize speech.

1
votes

Here is how to do it (the code is in PHP, but the idea classes are the same)

$client = new SpeechClient([
            'credentials' => json_decode(file_get_contents('keys.json'), true)
]);

$operation = $client->longRunningRecognize($config, $audio);
$operationName = $operation->getName()

Now the job has started and you can save "$operationName" somewhere (say in DB) to be used in another process.

In another process

$client = new SpeechClient([
                'credentials' => json_decode(file_get_contents('keys.json'), true)
    ]);
CloudSpeech::initOnce();
$newOperationResponse = $speechClient->resumeOperation($name, 'LongRunningRecognize');
     
if ($newOperationResponse->operationSucceeded()) {
           $result = $newOperationResponse->getResult();

}
...

Notice: Make sure to put "LongRunningRecognize" as resume operation name and NOT "longRunningRecognize" (first letter should be uppercase - contrary to documentation https://github.com/googleapis/google-cloud-php-speech/blob/master/src/V1/Gapic/SpeechGapicClient.php#L312)

Otherwise the response will be protobuf encoded (https://github.com/googleapis/google-cloud-php-speech/blob/master/src/V1/Gapic/SpeechGapicClient.php#L135)

This answer helped to find the final solution https://stackoverflow.com/a/57209441/932473