don't_panic
personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

23Mar/1177

Accessing Google Speech API / Chrome 11

Tweet

Like this article? Follow me on Twitter @mikepultz for more updates.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you'll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

spacer

If you're running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I've seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that's really hard to do without really good language models- not something you'd be able to build into a browser.

I found the files I was looking for in the chromium source repo:

src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex- but it looks like it's some sort of specially modified version of Speex- I'm not sure what it is, but it just didn't look quite right.

If that's the case, there should be no reason why I can't just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I'm not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

Filed under: Development, Speech Tools, Telephony Leave a comment
Comments (77) Trackbacks (13) ( subscribe to comments on this post )
  1. fps
    December 2nd, 2011 - 10:51

    Zibri, I tried giving it a wav but the web service rejects it. Flac is the way to go for now it seems

  2. victor
    December 7th, 2011 - 01:52

    I recorded a wav file test.wav, and then used sox to convert it to test.flac. I copied the posted perl script into file test.pl, and created a test.txt which contains the test.flac file name. When i run: perl test.pl test.txt, i got no recognition result. {“status”:5,”id”:”915a84d84d46e13fed2f52a44b652bfc-1″,”hypotheses”:[]}
    BTW, I run it in cygwin
    Could you please tell me where i made mistake?

    thanks,

    Victor

  3. mike
    December 7th, 2011 - 13:14

    Hey Victor,

    Why are you passing in a txt file?

    The PERL script takes the name of an audio file as the first argument, and not a txt file that contains the name of the audio file.

    Also make sure you’ve encoded it at either 8khz or 16khz, and adjust the Content-Type header accordingly; The example I posted uses 16.

    Mike

  4. Javier
    December 8th, 2011 - 08:55

    Very interesting feature. Someone could share a PHP code that reads the FLAC archive name locally in server and do this?

  5. vic kumar
    December 9th, 2011 - 20:35

    Hey, I’ve written a bit of Perl code, and created sort of my own version of Siri / Iris. It’s still pretty rough around the edges, but I wanted to post it on this blog since it’s where I got some of the code. You can use sox to normalize the sound to -5 dB, which helps improve accuracy.

    So, how can I post a file on here?

  6. mike
    December 12th, 2011 - 13:29

    Hey Vic,

    I dont really have a way to post files- but you could just include a link somewhere to download-

    Mike

  7. Carlitos
    December 14th, 2011 - 01:24

    Hi Mike,
    I wanted to say that you are one of the only references online for posting data to the google speech recognition engine. Thanks for sharing the info.

    I’m trying to use the google speech recognition with my robot and I would like to write a bash script to do so. I am not familiar with Perl and I’m having a hard time posting the right data to the server using wget or curl.

    Can you give me any tip on how to use wget (as you mention in the post) to post the flac file?

    Right now, if I try this:

    wget –post-file=out.flac https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US

    it gives me ERROR 400: Content-Type media type is not audio.

    Any help will be very appreciated. Thanks!

  8. mike
    December 14th, 2011 - 18:04

    Carlitos-

    It looks like you’re missing the Content-Type header- if you look back in some of the old comments, I’m pretty sure there was somebody that posted an example using wget.

    Mike

  9. Carlitos
    December 14th, 2011 - 21:51

    Oh! I’m an idiot for forgetting the headers and for not realizing there where more comments!

    Here is my result, this will record, encode to flac, send the request to google ans save teh output to speech.txt:

    arecord -f cd -t wav -d 5 -r 16000 | flac – -f –best –sample-rate 16000 -o out.flac; wget –post-file out.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O speech.txt www.google.com/speech-api/v1/recognize?lang=en

    Thanks a lot for your code and help Mike!

  10. Tha Narie
    December 15th, 2011 - 09:14

    FYI:

    status: 0 – correct
    status: 4 – missing audio file
    status: 5 – incorrect audio file

    Sample rate can be anything between 8000 and 44000 (not 44100), and doesn’t have to be exactly 8000 or 16000. If it is out of bounds you get an 400 html error page returned.

  11. framillo
    December 17th, 2011 - 09:00

    very interesting, thank you a lot for sharing. Does anybody have an idea on how to send audio from mic in real time instead that posting the flac file? i would like to reuse chrome for speech recognition, but my system would not have a monitor or tv, so clicking the mic picture to start recognition is not possible.
    thank you all

  12. Robert Buzink
    December 27th, 2011 - 21:15

    Thanks for researching Chromes speech recognition API. Great work! I made a little video using the api from the Linux command line: www.youtube.com/watch?v=Sf3dMgooufc (use it if you like)

  13. Hiyassat
    January 7th, 2012 - 15:39

    this is for English language , how to set the required language

  14. Travis W
    January 13th, 2012 - 00:01

    Hiyassat, I would imagine you can change where it says “lang=en-us” to lang= . Haven’t tested that though.

  15. AnthonyCE
    January 13th, 2012 - 18:47

    Mike,
    any examples of iOS use of this API? I have an Xcode project where I need to convert the voice recorded .wav file in real-time to .flac and then would like to POST to the google API. Any thoughts or examples would be great.

  16. DogWang
    January 23rd, 2012 - 04:30

    Interesting! It is very easy to use.

    The Speech Input API Specification can be found at www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html ( and I found it in the repo )
    I think anyone who reads this article and wants to use Google Speech API should have a look at the code in the
    source repository first.

    BTW, this is the URL I use: ( lang=zh-CN is for Chinese and it works fine spacer )
    www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1

  17. Raveesh Sharma
    January 25th, 2012 - 04:24

    Is there a way that google speech api can detect pauses in a statement? as in
    If I say “Java is a programming language. and so is C.
    can i forcefully make google translate the statement uptil the pause and then translate the rest of the statement?

  18. TrkHefner_
    January 25th, 2012 - 14:52

    I guess I’m almost there, just can’t seem to get the wget example working, my result file keeps returning empty and I’m not getting any errors either.

    Could anyone please post a quick example as to how to go about posting to the API using cURL?

    many thanks!

  19. TrkHefner_
    January 25th, 2012 - 15:50

    Okay, I did the wget call working in the end, but I’m still not there.

    I am calling the API from a php file with an exec command with the api call.
    Strangely, the addressbar changes from “speech.php” (my file) to “main?url=out.flac&tid=0&w=1440&h=809″ and my result file contains some weird html containing an br (with main?url=out.flac&tid=0&w=1440&h=809 as its src).

    Here’s my code:

    $cmd = ‘wget –post-file out.flac -header=”Content-Type: audio/x-flac; rate=16000″ -O resultaat.html https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US';
    exec($cmd);
    echo file_get_contents(“resultaat.html”);

    It seems like instead of returning the JSON object I desire, the API tries to redirect the call.
    I hardly doubt the Google people have just now restricted the use of their speech-to-text, so I figure it’s me who fails.

    Any pointers would be greatly appreciated.

  20. TrkHefner_
    January 27th, 2012 - 11:19

    Okay, I finally got it working.
    I’m sorry for the blob comments I placed in the process.
    Since I figure I won’t be the last person to walk in on the troubles I experienced, I’ll try to make up my blobbing by explaining how I resolved the matter. I hope that by doing so I can help others googling into this thread.

    First of all, I must point out that I have been trying to make this work on a windows machine. So in order to successfully make a wget call I had to install GnuWin Wget (sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/). The problems I experienced earlier were due to the fact that I was using the wrong release. Make sure you download and run the setup program with all dependency files included.

    Second of all, the win32 version of wget seems to accept its parameters slightly different than explained in this article and the comments underneath.
    The command I issue is now “wget –post-file=”out.flac” –header=”Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt” –no-check-certificate https://www.google.com/speech-api/v1/recognize?lang=en“.
    Unlike I expected this didn’t write the actual desired google JSON response to result.txt.
    Instead it writes the wget log to the file. This log contained “HTTP request sent, awaiting response… 200 OK Length: unspecified [application/json] Saving to: ‘recognize@lang=en’”. Why on earth it writes the JSON obj to a file called ‘recognize@lang=en’ is a riddle to me, but sure enough, the file (saved in the wget executable dir) contains the text I desired.

    Strangely, the accuracy is pretty low and thus the result/recognition level is quite bad. The api recognizes only half of the words the example at html5rocks and google translate do. I figure they use a higher bitrate or something or there’s something else I am not taking into account.

    Anyway, thanks for this article and the comments that helped my on my way, I hope I was able to contribute.

  21. gluxon
    January 28th, 2012 - 22:35

    This is the command that worked for me on Ubuntu 11.10

    wget www.google.com/speech-api/v1/recognize?lang=en-us –header “Content-Type: audio/x-flac; rate=16000″ –post-file=NameOfFile.flac –output-file=output.txt

    Hope this helped! spacer

  22. Rob
    February 5th, 2012 - 17:38

    I have no idea what I’m doing wrong or what this output means when I do wget. I’ve tried everyone’s statements here for wget and none seem to work for me.

    wget www.google.com/speech-api/v1/recognize?lang=en-us -header “Content-Type: audio/x-flac; rate=16000? -post-file=rec1.flac -output-file=output.txt

    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
    syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
    –2012-02-05 17:34:11– www.google.com/speech-api/v1/recognize?lang=en-us
    Resolving www.google.com... 74.125.113.104, 74.125.113.105, 74.125.113.106, …
    Connecting to www.google.com|74.125.113.104|:80… connected.
    HTTP request sent, awaiting response… 405 HTTP method GET is not supported by this URL
    2012-02-05 17:34:12 ERROR 405: HTTP method GET is not supported by this URL.

    –2012-02-05 17:34:12– %96header/
    Resolving \226header… failed: No data record of requested type.
    wget: unable to resolve host address `-header’
    –2012-02-05 17:34:14– ftp://%93content-type/
    => `.listing’
    Resolving \223content-type… failed: No data record of requested type.
    wget: unable to resolve host address `”content-type’
    unlink: No such file or directory
    –2012-02-05 17:34:16– audio/x-flac;
    Resolving audio… failed: No data record of requested type.
    wget: unable to resolve host address `audio’
    –2012-02-05 17:34:18– rate=16000/?
    Resolving rate=16000… failed: No data record of requested type.
    wget: unable to resolve host address `rate=16000′
    –2012-02-05 17:34:21– %96post-file=rec1.flac/
    Resolving \226post-file=rec1.flac… failed: No data record of requested type.
    wget: unable to resolve host address `-post-file=rec1.flac’
    –2012-02-05 17:34:21– %96output-file=output.txt/
    Resolving \226output-file=output.txt… failed: No data record of requested type.

    wget: unable to resolve host address `-output-file=output.txt’

  23. vfnik
    February 9th, 2012 - 06:10

    I have tried this code in PHP.

    <?php
    $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
    $file = 'hello.flac';

    $audio = "";
    $file=fopen("hello.flac","r");
    while(!feof($file)) {
    $audio .= fgets($file). "”;
    }
    fclose($file);
    $data = array(‘Content_Type’ => ‘audio/x-flac; rate=16000′,’Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);
    curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);

    ?>

    But it is giving me response as:

    Content-Type media type is not audio

    Content-Type media type is not audio
    Error 400

    Can anybody suggest me, where I am doing wrong???????

  24. stranger
    February 9th, 2012 - 12:52

    vfnik, the @ in a curl postfields array is a reference to a file, so, instead of reading the data in with the fopen/fread etc, just put the filename there so, instead of @$audio, use @$file

    untested but i think that will work

  25. hzq
    February 10th, 2012 - 01:15

    it works!!!!

  26. Mike Z.
    February 10th, 2012 - 01:40

    For those interested in using SPEEX codec, there is a fork that implements the MIME-Type “x-speex-with-header-byte” and works perfectly against Google’s Speech Recognition APIs. Its available at qxip.net/wiki or directly on GitHUB: https://github.com/QXIP/Speex-with-header-bytes

  27. Jon Schwartz
    February 10th, 2012 - 11:08

    I ported your code to Python (thanks!!):

    import urllib2
    import os
    import sys

    audio = open(sys.argv[1], ‘rb’)
    filesize = os.path.getsize(sys.argv[1])

    print sys.argv[1],’ Read’,”\n”

    req = urllib2.Request(url=’https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US’)
    req.add_header(‘Content-type’,'audio/x-flac; rate=16000′)
    req.add_header(‘Content-length’, str(filesize))
    req.add_data(audio)

    print ‘Request built’,”\n”

    response = urllib2.urlopen(req)

    print ‘Response returned’,”\n”

    print response.read()

« Older Comments

Leave a comment Cancel reply

(required)

(required)

Introducing Speech 2 Text API by Google | Captico
Playing around with speech-to-text « load,buffer,play
How to add a full-vocabulary-sized english language model? « Support Forums
Accessing Google Speech API / Chrome 11 « don’t_panic « marcusjpotter
“free” Google Speech Recognition API | Intelligible Babble
How to Add Speech Recognition To Website? HTML5 Tips | Globinch
Flash SPEEX codec coversion for Google Speech API – a challenge | Technical support, Computer, programming issue, issue tracking, quality assurance
Speech Recognition for the Web
I Hope Apple Improves Voice Recognition... » Gadgets, Software » Russell Heimlich
Hack into the Google Chrome beta speech recognition api | Robert Buzink
Nao 1337 uses Google Speech-to-Text Service | Carlitos' Contraptions
Building My Own Siri / Jarvis « cranklin.com
Asiri: Let Siri Speak Your Language | Abdulrahman Alotaiba's Blog
First Release of PHP Swift TTS Extension » « Net_DNS2 Version 1.0.1 Released
Fonolo on Facebook

companies

  • Fonolo
  • Mr.Host Web Hosting

projects

  • #onholdwith
  • Deep Dial
  • Mr.DNS
  • Net_DNS2
  • php-swift-tts
  • Speech::Swift
  • Speech::Swift::Simple

friends

  • Always Crashing In The Same Car
  • Bryan Dobson
  • Camilla d’Errico
  • Elya McCleave
  • Jason Pultz
  • Justin Hamade
  • Le Bird
  • Remember Love?
  • S a t u r n i n
  • Tim Lukian

technology

  • Asterisk VOIP News
  • Call the Cloud
  • eComm
  • FierceVoIP
  • Jason Goecke’s blog
  • ModMyi
  • TechCrunch
  • VoIP-info.org
  • VOX POPULI

extracts

  • January 2012 (2)
  • December 2011 (2)
  • November 2011 (1)
  • October 2011 (2)
  • September 2011 (3)
  • July 2011 (1)
  • June 2011 (4)
  • April 2011 (3)
  • March 2011 (1)
  • December 2010 (1)
  • November 2010 (2)
  • September 2010 (1)
  • August 2010 (1)
  • July 2010 (2)
  • May 2010 (4)
  • April 2010 (1)
  • February 2010 (2)
  • January 2010 (1)
  • September 2009 (1)
  • August 2009 (2)
  • July 2009 (1)
  • April 2009 (1)
  • March 2009 (1)
  • February 2009 (2)
  • January 2009 (3)
  • December 2008 (5)
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.