i already spent 1 day to know best practice using google speech API.
this is my last try. here we will use online source to make sure we have same audio. The another requirement is you need ffmpeg
to convert mp3 to google API desired format.
audio information:
- singer: adele
- song: chasing pavement
- possible languange: en-GB (adele origin) or en-US
- sample rate: 44100Hz
- channel : stereo (2-channel)
- format: mp3
what i did:
- use both format flac or wav
- use both sample rate original (44100) or 16000
- always use mono (1-chanel)
- use both language en-GB and en-US
output what i want: get text alignment. But this is secondary target, because now i'm focussing on why i get so many missing transcribed text.
Note: run it on bash/cmd
script: basic Synchronous transcrib.php
<?php
set_time_limit(300); //5min
//google speech php library
require __DIR__ . '/vendor/autoload.php';
# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;
//use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Core\ExponentialBackoff;
//json credential path
$google_json_credential = 'cloud-f7cd1957f36a.json';
putenv("GOOGLE_APPLICATION_CREDENTIALS=$google_json_credential");
# Your Google Cloud Platform project ID
$projectId = 'cloud-178108';
//$languageCode = 'en-US'; //not good (too many miss
$languageCode = 'en-GB'; //adele country
$oldFile = "test.mp3";
//flac or wav??
$typeFile = 'wav';
$sampleRate = 16000;
if($typeFile = 'wav'){
$newFile = "test.wav";
$encoding='LINEAR16';
$ffmpeg_command = "ffmpeg -i $oldFile -acodec pcm_s16le -ar $sampleRate -ac 1 $newFile -y";
}else{
$newFile = "test.flac";
$encoding='FLAC';
$ffmpeg_command = "ffmpeg -i $oldFile -c:a flac -ar $sampleRate -ac 1 $newFile -y";
}
//download file
//original audio info: adele - chasing pavements, stereo (2 channel) 44100Hz mp3
$rawFile = file_get_contents("http://www.karaokebuilder.com/pix/toolkit/sam01.mp3");
//save file
file_put_contents($oldFile, $rawFile);
//convert to google cloud format using ffmpeg
shell_exec($ffmpeg_command);
# The audio file's encoding and sample rate
$options = [
'encoding' => $encoding,
'sampleRateHertz' => $sampleRate,
'enableWordTimeOffsets' => true,
];
// Create the speech client
$speech = new SpeechClient([
'projectId' => $projectId,
'languageCode' => $languageCode,
]);
// Make the API call
$results = $speech->recognize(
fopen($newFile, 'r'),
$options
);
// Print the results
foreach ($results as $result) {
$alternative = $result->alternatives()[0];
printf('Transcript: %s' . PHP_EOL, $alternative['transcript']);
print_r($result->alternatives());
}
Result:
en-US:
wav: even if it leads nowhere [confidence: 0.86799717]
flac: even if it leads nowhere [confidence: 0.92401636]
**en-GB: **
wav: happy birthday balloons delivered Leeds Norway [confidence: 0.4939031]
flac: happy birthday balloons delivered Leeds Norway [confidence: 0.5762244]
expected:
Should I give up
Or should I just keep chasing pavements?
Even if it leads nowhere
Or would it be a waste?
Even If I knew my place should I leave it there?
Should I give up
Or should I just keep chasing pavements?
Even if it leads nowhere
if you see the result vs expected result you will know that's not only i missing so many text, but that's miss spell too.
to be honest. I dont know if machine (google cloud) can hear my converted audio clearly or not. but i try to send the best converted audio as i can.
did i miss something in my script? or i'm not converting audio correctly?