3
votes

I want to use Google Speech API in my current project.

I got my information about how to access the api from here

As described on github, you have to send a post webrequest to the server and get back a result as json.

I also got some source code used for the v1 api from here

Setting up the request is not that hard:

WebRequest request = WebRequest.Create(Constants.GoogleRequestString);
            request.Method = "POST";
            request.ContentType = "audio/x-flac; rate=" + sampleRate;
            request.ContentLength = bytes.Length;

Where in my example the Constants.GoogleRequestString equals to https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4

I downloaded the .flac files from the github link and wrote a little program in c# which is loading the bytes of the flac file and sending it to the server with the slightly modified method GoogleRequest(byte[] bytes, int sampleRate)

I open the stream as shown in the method, and send all bytes to the server. I get the response but

The JSON String I get is: "{\"result\":[]}"

I have no idea why it is not working. Either the file, or spoken text in the file is not correct (but if I listen to it with vlc I clearly hear the spoken text) or my program still has some bugs.

Have you ever encountered the problem to get no result by the speech-api? Should't it say something like result: couldn't understand what is spoken or any other error message?

I just tried out the .wav file. This worked for me.

1
what is your sample rate? for me it was the sample rate of the flac. - Shreyas Kapur
According to VLC the sample rate is 44100Hz. And I use 44100 for the flac-files. But still no result. - Loki
pastebin.com/Ns3XxBNP This is the class I use for speech to text. (edited from the CloudSpeech project) The function of interest is Recognize(Stream contentToRecognize), where you just throw in your flac stream. Obviously ignore the JSON parser, just see if you get a response from this, because this definitely works for me. - Shreyas Kapur

1 Answers

1
votes

Your code is fine assuming it resembles this:

var uriBuilder = new UriBuilder(
    "https",
    "www.google.com",
    443,
    "speech-api/v2/recognize",
    "?output=json&lang=en-us&key=YOURAPIKEY");
int sampleRate = 44100;

using (var stream = File.Open("c:\\tmp\\g2.flac", FileMode.Open))
{

    HttpWebRequest request = (HttpWebRequest) WebRequest.Create(uriBuilder.Uri);
    request.Method = "POST";
    request.ContentType = "audio/x-flac; rate=" + sampleRate;
    request.AutomaticDecompression = DecompressionMethods.GZip;

    stream.CopyTo(request.GetRequestStream());
    try
    {
        using (var resp = request.GetResponse().GetResponseStream())
        {
            using (var sr = new StreamReader(resp))
            {
                Debug.WriteLine(sr.ReadToEnd());
            }
        }
    }
    catch(WebException ee)
    {
        var all = new StreamReader(ee.Response.GetResponseStream()).ReadToEnd();
        Debug.WriteLine(all);
    }
}

What is important though is the exact format of the FLAC file. I used Audacity to control how my audio track would be saved.

After recording I changed the track settings to:

  • Mono
  • Sample Format: 16-Bit PCM
  • Rate: 44100 Hz

The following screenshot shows those settings:

audacity settings

With the default stereo track and 32-bit float sample format I couldn't get the Speech API to produce any other result then the empty json payload you also got.

With the above settings my result is:

{
    "result" : []
}{
    "result" : [{
            "alternative" : [{
                    "transcript" : "translate this",
                    "confidence" : 0.92849225
                }, {
                    "transcript" : "translate days"
                }, {
                    "transcript" : "translate dish"
                }, {
                    "transcript" : "translate fish"
                }, {
                    "transcript" : "translate these"
                }
            ],
            "final" : true
        }
    ],
    "result_index" : 0
}

My English pronunciation isn't very good as Google thinks I want to translate fish ...

If you get an http error (like 403 Forbidden) the exception handler tries to read the full response from the http body. If your authentication key is incorrect it will tell you that.

To get your api-keys to work with the Speech API follow the instructions here

Make sure you are a member of [email protected] (you can just subscribe to chromium-dev and choose not to receive mail).

After that you can create a server key:

enter image description here