Accessing Google Speech API / Chrome 11
Like this article? Follow me on Twitter @mikepultz for more updates.
Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you'll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.
If you're running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:
slides.html5rocks.com/#speech-input
Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I've seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that's really hard to do without really good language models- not something you'd be able to build into a browser.
I found the files I was looking for in the chromium source repo:
src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/
It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex- but it looks like it's some sort of specially modified version of Speex- I'm not sure what it is, but it just didn't look quite right.
If that's the case, there should be no reason why I can't just POST something to it myself?
The URL listed in speech_recognition_request.cc is:
https://www.google.com/speech-api/v1/recognize
So a quick few lines of PERL (or PHP or just use wget on the command line):
#!/usr/bin/perl require LWP::UserAgent; my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"; my $audio = ""; open(FILE, "<" . $ARGV[0]); while(<FILE>) { $audio .= $_; } close(FILE); my $ua = LWP::UserAgent->new; my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio); if ($response->is_success) { print $response->content; } 1;
This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)
To run it, just do:
[root@prague mike]# ./speech i_like_pickles.flac
The response is pretty straight forward JSON:
{ "status": 0, "id": "b3447b5d98c5653e0067f35b32c0a8ca-1", "hypotheses": [ { "utterance": "i like pickles", "confidence": 0.9012539 }, { "utterance": "i like pickle" }] }
I'm not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!
- victor
December 7th, 2011 - 01:52I recorded a wav file test.wav, and then used sox to convert it to test.flac. I copied the posted perl script into file test.pl, and created a test.txt which contains the test.flac file name. When i run: perl test.pl test.txt, i got no recognition result. {“status”:5,”id”:”915a84d84d46e13fed2f52a44b652bfc-1″,”hypotheses”:[]}
BTW, I run it in cygwin
Could you please tell me where i made mistake?thanks,
Victor
- mike
December 7th, 2011 - 13:14Hey Victor,
Why are you passing in a txt file?
The PERL script takes the name of an audio file as the first argument, and not a txt file that contains the name of the audio file.
Also make sure you’ve encoded it at either 8khz or 16khz, and adjust the Content-Type header accordingly; The example I posted uses 16.
Mike
- Javier
December 8th, 2011 - 08:55Very interesting feature. Someone could share a PHP code that reads the FLAC archive name locally in server and do this?
- vic kumar
December 9th, 2011 - 20:35Hey, I’ve written a bit of Perl code, and created sort of my own version of Siri / Iris. It’s still pretty rough around the edges, but I wanted to post it on this blog since it’s where I got some of the code. You can use sox to normalize the sound to -5 dB, which helps improve accuracy.
So, how can I post a file on here?
- mike
December 12th, 2011 - 13:29Hey Vic,
I dont really have a way to post files- but you could just include a link somewhere to download-
Mike
- Carlitos
December 14th, 2011 - 01:24Hi Mike,
I wanted to say that you are one of the only references online for posting data to the google speech recognition engine. Thanks for sharing the info.I’m trying to use the google speech recognition with my robot and I would like to write a bash script to do so. I am not familiar with Perl and I’m having a hard time posting the right data to the server using wget or curl.
Can you give me any tip on how to use wget (as you mention in the post) to post the flac file?
Right now, if I try this:
wget –post-file=out.flac https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US
it gives me ERROR 400: Content-Type media type is not audio.
Any help will be very appreciated. Thanks!
- mike
December 14th, 2011 - 18:04Carlitos-
It looks like you’re missing the Content-Type header- if you look back in some of the old comments, I’m pretty sure there was somebody that posted an example using wget.
Mike
- Carlitos
December 14th, 2011 - 21:51Oh! I’m an idiot for forgetting the headers and for not realizing there where more comments!
Here is my result, this will record, encode to flac, send the request to google ans save teh output to speech.txt:
arecord -f cd -t wav -d 5 -r 16000 | flac – -f –best –sample-rate 16000 -o out.flac; wget –post-file out.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O speech.txt www.google.com/speech-api/v1/recognize?lang=en
Thanks a lot for your code and help Mike!
- Tha Narie
December 15th, 2011 - 09:14FYI:
status: 0 – correct
status: 4 – missing audio file
status: 5 – incorrect audio fileSample rate can be anything between 8000 and 44000 (not 44100), and doesn’t have to be exactly 8000 or 16000. If it is out of bounds you get an 400 html error page returned.
- framillo
December 17th, 2011 - 09:00very interesting, thank you a lot for sharing. Does anybody have an idea on how to send audio from mic in real time instead that posting the flac file? i would like to reuse chrome for speech recognition, but my system would not have a monitor or tv, so clicking the mic picture to start recognition is not possible.
thank you all - Robert Buzink
December 27th, 2011 - 21:15Thanks for researching Chromes speech recognition API. Great work! I made a little video using the api from the Linux command line: www.youtube.com/watch?v=Sf3dMgooufc (use it if you like)
- Hiyassat
January 7th, 2012 - 15:39this is for English language , how to set the required language
- Travis W
January 13th, 2012 - 00:01Hiyassat, I would imagine you can change where it says “lang=en-us” to lang= . Haven’t tested that though.
- AnthonyCE
January 13th, 2012 - 18:47Mike,
any examples of iOS use of this API? I have an Xcode project where I need to convert the voice recorded .wav file in real-time to .flac and then would like to POST to the google API. Any thoughts or examples would be great. - DogWang
January 23rd, 2012 - 04:30Interesting! It is very easy to use.
The Speech Input API Specification can be found at www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html ( and I found it in the repo )
I think anyone who reads this article and wants to use Google Speech API should have a look at the code in the
source repository first.BTW, this is the URL I use: ( lang=zh-CN is for Chinese and it works fine )
www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1 - Raveesh Sharma
January 25th, 2012 - 04:24Is there a way that google speech api can detect pauses in a statement? as in
If I say “Java is a programming language. and so is C.
can i forcefully make google translate the statement uptil the pause and then translate the rest of the statement? - TrkHefner_
January 25th, 2012 - 14:52I guess I’m almost there, just can’t seem to get the wget example working, my result file keeps returning empty and I’m not getting any errors either.
Could anyone please post a quick example as to how to go about posting to the API using cURL?
many thanks!
- TrkHefner_
January 25th, 2012 - 15:50Okay, I did the wget call working in the end, but I’m still not there.
I am calling the API from a php file with an exec command with the api call.
Strangely, the addressbar changes from “speech.php” (my file) to “main?url=out.flac&tid=0&w=1440&h=809″ and my result file contains some weird html containing an br (with main?url=out.flac&tid=0&w=1440&h=809 as its src).Here’s my code:
$cmd = ‘wget –post-file out.flac -header=”Content-Type: audio/x-flac; rate=16000″ -O resultaat.html https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US';
exec($cmd);
echo file_get_contents(“resultaat.html”);It seems like instead of returning the JSON object I desire, the API tries to redirect the call.
I hardly doubt the Google people have just now restricted the use of their speech-to-text, so I figure it’s me who fails.Any pointers would be greatly appreciated.
- TrkHefner_
January 27th, 2012 - 11:19Okay, I finally got it working.
I’m sorry for the blob comments I placed in the process.
Since I figure I won’t be the last person to walk in on the troubles I experienced, I’ll try to make up my blobbing by explaining how I resolved the matter. I hope that by doing so I can help others googling into this thread.First of all, I must point out that I have been trying to make this work on a windows machine. So in order to successfully make a wget call I had to install GnuWin Wget (sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/). The problems I experienced earlier were due to the fact that I was using the wrong release. Make sure you download and run the setup program with all dependency files included.
Second of all, the win32 version of wget seems to accept its parameters slightly different than explained in this article and the comments underneath.
The command I issue is now “wget –post-file=”out.flac” –header=”Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt” –no-check-certificate https://www.google.com/speech-api/v1/recognize?lang=en“.
Unlike I expected this didn’t write the actual desired google JSON response to result.txt.
Instead it writes the wget log to the file. This log contained “HTTP request sent, awaiting response… 200 OK Length: unspecified [application/json] Saving to: ‘recognize@lang=en’”. Why on earth it writes the JSON obj to a file called ‘recognize@lang=en’ is a riddle to me, but sure enough, the file (saved in the wget executable dir) contains the text I desired.Strangely, the accuracy is pretty low and thus the result/recognition level is quite bad. The api recognizes only half of the words the example at html5rocks and google translate do. I figure they use a higher bitrate or something or there’s something else I am not taking into account.
Anyway, thanks for this article and the comments that helped my on my way, I hope I was able to contribute.
- gluxon
January 28th, 2012 - 22:35This is the command that worked for me on Ubuntu 11.10
wget www.google.com/speech-api/v1/recognize?lang=en-us –header “Content-Type: audio/x-flac; rate=16000″ –post-file=NameOfFile.flac –output-file=output.txt
Hope this helped!
- Rob
February 5th, 2012 - 17:38I have no idea what I’m doing wrong or what this output means when I do wget. I’ve tried everyone’s statements here for wget and none seem to work for me.
wget www.google.com/speech-api/v1/recognize?lang=en-us -header “Content-Type: audio/x-flac; rate=16000? -post-file=rec1.flac -output-file=output.txt
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
–2012-02-05 17:34:11– www.google.com/speech-api/v1/recognize?lang=en-us
Resolving www.google.com... 74.125.113.104, 74.125.113.105, 74.125.113.106, …
Connecting to www.google.com|74.125.113.104|:80… connected.
HTTP request sent, awaiting response… 405 HTTP method GET is not supported by this URL
2012-02-05 17:34:12 ERROR 405: HTTP method GET is not supported by this URL.–2012-02-05 17:34:12– %96header/
Resolving \226header… failed: No data record of requested type.
wget: unable to resolve host address `-header’
–2012-02-05 17:34:14– ftp://%93content-type/
=> `.listing’
Resolving \223content-type… failed: No data record of requested type.
wget: unable to resolve host address `”content-type’
unlink: No such file or directory
–2012-02-05 17:34:16– audio/x-flac;
Resolving audio… failed: No data record of requested type.
wget: unable to resolve host address `audio’
–2012-02-05 17:34:18– rate=16000/?
Resolving rate=16000… failed: No data record of requested type.
wget: unable to resolve host address `rate=16000′
–2012-02-05 17:34:21– %96post-file=rec1.flac/
Resolving \226post-file=rec1.flac… failed: No data record of requested type.
wget: unable to resolve host address `-post-file=rec1.flac’
–2012-02-05 17:34:21– %96output-file=output.txt/
Resolving \226output-file=output.txt… failed: No data record of requested type.wget: unable to resolve host address `-output-file=output.txt’
- vfnik
February 9th, 2012 - 06:10I have tried this code in PHP.
<?php
$url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
$file = 'hello.flac';$audio = "";
$file=fopen("hello.flac","r");
while(!feof($file)) {
$audio .= fgets($file). "”;
}
fclose($file);
$data = array(‘Content_Type’ => ‘audio/x-flac; rate=16000′,’Content’ => @$audio);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);
curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);?>
But it is giving me response as:
Content-Type media type is not audio
Content-Type media type is not audio
Error 400Can anybody suggest me, where I am doing wrong???????
- stranger
February 9th, 2012 - 12:52vfnik, the @ in a curl postfields array is a reference to a file, so, instead of reading the data in with the fopen/fread etc, just put the filename there so, instead of @$audio, use @$file
untested but i think that will work
- hzq
February 10th, 2012 - 01:15it works!!!!
- Mike Z.
February 10th, 2012 - 01:40For those interested in using SPEEX codec, there is a fork that implements the MIME-Type “x-speex-with-header-byte” and works perfectly against Google’s Speech Recognition APIs. Its available at qxip.net/wiki or directly on GitHUB: https://github.com/QXIP/Speex-with-header-bytes
- Jon Schwartz
February 10th, 2012 - 11:08I ported your code to Python (thanks!!):
import urllib2
import os
import sysaudio = open(sys.argv[1], ‘rb’)
filesize = os.path.getsize(sys.argv[1])print sys.argv[1],’ Read’,”\n”
req = urllib2.Request(url=’https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US’)
req.add_header(‘Content-type’,'audio/x-flac; rate=16000′)
req.add_header(‘Content-length’, str(filesize))
req.add_data(audio)print ‘Request built’,”\n”
response = urllib2.urlopen(req)
print ‘Response returned’,”\n”
print response.read()
Leave a comment Cancel reply
(required)
(required)
companies
- Fonolo
- Mr.Host Web Hosting
projects
- #onholdwith
- Deep Dial
- Mr.DNS
- Net_DNS2
- php-swift-tts
- Speech::Swift
- Speech::Swift::Simple
friends
- Always Crashing In The Same Car
- Bryan Dobson
- Camilla d’Errico
- Elya McCleave
- Jason Pultz
- Justin Hamade
- Le Bird
- Remember Love?
- S a t u r n i n
- Tim Lukian
technology
- Asterisk VOIP News
- Call the Cloud
- eComm
- FierceVoIP
- Jason Goecke’s blog
- ModMyi
- TechCrunch
- VoIP-info.org
- VOX POPULI
extracts
- January 2012 (2)
- December 2011 (2)
- November 2011 (1)
- October 2011 (2)
- September 2011 (3)
- July 2011 (1)
- June 2011 (4)
- April 2011 (3)
- March 2011 (1)
- December 2010 (1)
- November 2010 (2)
- September 2010 (1)
- August 2010 (1)
- July 2010 (2)
- May 2010 (4)
- April 2010 (1)
- February 2010 (2)
- January 2010 (1)
- September 2009 (1)
- August 2009 (2)
- July 2009 (1)
- April 2009 (1)
- March 2009 (1)
- February 2009 (2)
- January 2009 (3)
- December 2008 (5)
December 2nd, 2011 - 10:51
Zibri, I tried giving it a wav but the web service rejects it. Flac is the way to go for now it seems