Vista speech recognition screencast: It works!

Surprise surprise. Windows Vista speech recognition actually works. Contrary to what MSNBC criticize as a ‘wreck’, the speech recognition technology is well developed and highly usable. I got my hands on the July CTP build (5472) of Windows Vista and gave it a try, and I recorded what I found. I used the internal microphone array in my laptop, so the sound quality is not optimal but Vista handled it well.

This screencast focuses on the areas of speech recognotion including: dictation, commands, selecting alternatives, ‘show numbers’, ‘mouse grid’, mouse functions, web browsing, and keyboard functions. The following video contains mild coarse language, strong violence and parental guidance is advised.

Windows Vista speech recognition demo
Press image to play video (Quicktime H.264 5:03min)

Recorded on a Toshiba Portege M400 Tablet PC with a reduced resolution of 1024×768 for optimal frame-rates.

In addition: There is a whole lot more to speech recognition than what I have demonstrated. Even many basic features that I haven’t demonstrated, like spelling out a name or technical word. So don’t think what you see is what you get, having said that, the possibilities of speech recognition in a Tablet PC or Media Centre is mind-blowing. I thank everyone who enjoyed the video.

In addition addition: A lot of people also noticed that I had to repeat a bit of what I said. This is primarily my fault. In the speech control panel, you would often see a blue rotating circle. This is to indicate the speech recogniser is currently busy checking what is on-screen. Also, Camtasia the screen recording tool is very CPU intensive, so there was less processing power was available for the speech recogniser. I sometimes spoke too quickly between commands, so it hasn’t fully analysed all the available options yet.

Find out more about Camtasia Studio and the power of desktop recording

Update: Just to impress you even more, check out this perfect demo of speech recognition in french showing the same functionality! Oh la la.

135 insightful thoughts

  1. Darn Chinese and their censorship! where is the version with “mild coarse language & strong violence” hehehehehehehehehehe 😉
    Awesome vidclip. Thanx.

  2. oh yeah, and why the HELL use this abomination of a format “MOV”? Mr. Zheng, I say this clearly: ARE YOU WITH US OR WITH THEM? hehehehehe
    seriously. You’re probably missing thousands of viewers who, like me, hate the QT player with a passion but unlike me, are not intelligent enough to install QT-Alt.

    Peace be upon you and your future wise decisions in picking a vid format. Amen.

  3. @Factia: I used Quicktime because their H.264 codec is second-to-none the best at displaying videos like a screen capture. I like to preserve the quality of my videos at the highest level and at the highest possible resolution. So that’s why I picked Quicktime. I think it is more important to showcase something in its original glory than which format I used.

  4. All its glory MY TOUCHES! It’s better 50,000 YouTube (AKA “Fuzzy-Ball-Covered-With-Smudges -Tube) viewers enjoy this clip than 20 folks who comprehend the mystical nature of video codecs. FINE, be a perfectionist, but at least give an alternative for the poor masses??? YOU KNOW I’M RIGHT AND THAT it really isn’t all that important to see a razor-sharp MS website in the demo… I think the point will be understood even if the texts “Windows Vista” or “Painting” will be slightly (ok fine, HEAVILY) fuzzy. 😉

    God bless you, son, even in your mistaken path…

  5. @Fictia: I’d rather 20 folks get the full picture of what I’m showcasing, than 50,000 people getting a fuzzy and misleading experience. 😛

  6. Nice video. Was this in the publicly released Beta 2? I used it for over a week and didn’t see it. I wasn’t really impressed by the features. Google and Mozilla better alternatives for Mail, IE, and photo.

    I came to your site through digg. By the way, you have a catchy name.

  7. Fictia, have you heard of VLC? It will play .mov files just fine, along with just about any other file type or codec. And it works on nearly any platform. Quit your whining.

  8. Jim Bob, why you hurt my feelings so?

    (seriously, I don’t expect 99.7% of casual visitors to this website to understand why THE HELL the video didn’t work on their little precious computa’. perhaps..if their IQ is higher than 110..like the entire nation of Japan if I recall correctly..then they will scroll down, read our little chit-chat here and LOOK FOR QUICKTIME ALTERNATIVE or that VLC *CRAP* you are pushing. byebye sweetcheeks)

  9. I have to say that that video was amazing. Vista is really making me rethink about purchasing an iMac for my next computer. I always wanted to use Macs after my brother switched 4 years ago, but after watching your video, the interface seems slick as heck, and the voice recognition was very fluid. I did not expect the interface for voice to be so nice and intuitive! I think I will continue with Windows. Very cool Long Zheng!

  10. Pingback: Fatadam.tk
  11. Pingback: Antimail
  12. Out of topic, but related: Apple showed off some nice speech recognition in its WWDC keynote yesterday and was a clear winner against its Vista competitor. It really sounded almost natural, it was just mindblowing (among other things). Click here for the full conference video (Sorry its QuickTime, Fictia 😉 ) However, I haven’t seen anything from Apple on speech recognition, Vista gets credit for this one 🙂

  13. Pingback: Marco’s Mundo.
  14. Wow great video. I am really interested in what speech recognition will be like in Vista and have wanted to find speech recognition software for windows xp but to no avail.

    I can’t wait to use it for myself, love it! Great video.

  15. That was very cool, and you seem to know all the Keywords for speech recog., and also used IE and other MS software only. You even visited the MS website, which makes sense. But could it be that you work for MS, or better yet, worked on the speech recog. for MS. If that is the case, you’ve and the rest of team did do an impressive job.

  16. @Ronnyguru: Would you believe me if I told you this is my first time using speech recognition in Vista, and I only did the 5 minute tutorial plus looking at a few help documents? If you don’t, then you need to try it for yourself!

  17. ” John Edgar Aug 9th, 2006 at 2:58 pm

    I’m a die hard mac fan….

    …that was impressive. 🙁 ”

    Yeah, I agree :'(

    oh and, quicktime h.264 rules internet video, none better.

  18. Oh by the way, if the speech recognition works that well with your thick Australian accent, then it must be good.

    I mean I hardly could understand what you were saying myself.. Vista is a good listener!

  19. Seems to be going down the right lines – think of the possibilities for the disabled if they can get this right.

    I created a simple media player using voice recognition (available from my site, but you need dragon naturally speaking) and it’s not as easy or natural as you think to program a computer to respond to what people say at it.

  20. Istvan – you meant Text-to-Speech (TTS) and I wasted a whole hour suffering through Apple propaganda just to see that you FELL for the oldest trick in the book. Sorry, neither Microsoft nor Apple develop their own TTS technologies. I heard the Vista and Leopard synthesized speech quality two years ago in its original form – AT&T Natural Voices (google it up my brainwashed friend). The only reason Leopard sounded better than Vista is because it was SLOWED DOWN and had a DEEPER VOICE. Other than that it’s the same sh**. Apparently Apple does know how to dress up mandane technologies such as incremental backups and call it Time Machine etc but a smart user will know that all this crap is not appearing out of thin air – it’s STORED SOMEPLACE EATING UP DRIVE SPACE. It’s the SAME SH** AS ALWAYS DELETING STUFF IN WINDOWS AND NEVER EMPTYING THE RECYCLE BIN. Oh yeah and I don’t mind mov’s myself – my concern was for my slightly-retarded PC brethren, u see. 😉 This concern for others is something Mac users don’t relate do, since they have no problem working on a machine that doesn’t communicate well with the rest, less elitist “poor unfortunate PC bastards” 😉 Have a good one, my messianic friend. And let me know when you decided to forsake your silly ideals and bought a REAL LIMITLESS PC, not something that Jobs decides the looks for you and what parts “are good for ya”. RANT OVER. (goes to take the blue pill) hehehehe

  21. Pingback: Func. News
  22. Congratulations. You have now achieved approximately the same functionality for speech recognition that was delivered in OS/2 in 1994, integrated in the operating system. This is really hot new stuff!

    Considering that IBM had the speech recognition working on 1994 hardware I don’t expect this to be a resource hog at all…

  23. No fanboy here… Apple has had speech recognition for a long time. It was a toy back in the mid-90s, got a lot better with OS X and looks like it will be improving with Leopard.

    I’m impressed with this Vista demo — some definite improvements and innovation. Love the mouse grid, like the number overlays. Thanks for the demo.

  24. wow thats pretty good!…. I wonder if you incorporated google somehow into this, or if google had a similar program to run on vista… wouldn’t that be cool!

  25. @Fictia

    Whether you have an IQ of 110 or not… your Engrish clearly sucks as “mandane” is not a word. I won’t get started on your grammer either… but suffice it to say… it also ‘sucks’.

    However, to finish things off I think it’s good to point out this little blurb, “…my concern was for my slightly-retarded PC brethren…”

    …I find this funny. You see, bragging that you are a PC (read: Windows) user is quite idiotic since everybody knows what a pile of sh1t an OS it really is. And YOU are taking the blue pill? Comon buddy… break free of the brainwashing yourself and get on the Linux powered train. You know… the one that never stops?

    Ciao,

    Johny-D

  26. Gidday,

    Hard not to be impressed by it, but I am very curious as to how much grunt you have under the hood. In your demo Vista keeps up pretty well considering the fair pace at which you whipped it, so knowing your PC specs would be of benefit to those that live in hope for their current hardware.

    And if you have this clearly displayed elsewhere on your site, apologise … I am inherently lazy. 🙂

    Cheers.

  27. I was wondering what type of screen casting software do you use when making such great looking videos in Vista? I would love to get a hold of the exact type you use.

    Thanks

  28. @46and2: I stated in the blog entry that I used a Toshiba Portege M400 Tablet PC, which has a Centrino Duo 1.8ghz CPU 1GB of RAM. But of course, this doesn’t reflect the real performance of Vista as the screen recorder takes a LOT of processing power for a full-screen capture.

    @Paul Gentle: I used Camtasia. http://www.techsmith.com/ Works great in Vista!

  29. “mandane” is, indeed, not a word. But “Engrish” and “grammer” are, naturally. Oh, and this coming from a guy who cannot spell his own name, “Johny”-D. hahahahahahahahahahaha… wait a minute please – I have to change my underwear now from pissing all over the place…. ahh that was good now.

  30. I think Vista’s voice navigation and recognition fulfils Bill Gate’s dream. Nonetheless, this video clip is also interesting to watch. Good job, Long.
    Apple’s former CEO – John Sculley and Dr Kai-Fu Lee (Does this explain something? ;-)) demoed Casper on TV programme in 1992.
    http://video.google.co.uk/videoplay?docid=-2405921714806721086&q=good morning america

    This is an old Mac ad about Casper technology:
    http://video.google.co.uk/videoplay?docid=-2579081071391593857&q=voice recognition

    I love this sort of technology, and sometimes I use the voice navigation on my MacBook as well (I wrote some AppleScripts to add more functions to Mac OS X’s voice navigation). To be frank, after 10 minutes of commanding, my mouth is just sore.

    Vista has got a not-so-shabby text-to-speech engine and synthesised voice (from watching Apple’s WWDC ’06 Keynote), which may be a more practical feature for most users. I use TTS a lot when I am tired of reading the abstract of a journal article, and this is very handy.

    However, I find one thing which is rather interesting. Long thought it was his fault that he did not wait for Vista to finish commands, but a Mac user would think that is Mac’s fault. 😉

  31. Pingback: GottaBeMobile.com
  32. Quicktime? So we have to look at it in a darned postage-stamp size window… and installs Apple’s intrusive bloatware to do it?
    Forget about it….

  33. @Jeff Fisher – wtf are you talking about? First of all, the video is a full screen capture. The THUMBNAIL you see is exactly that, a thumbnail. Also, there is nothing wrong with Quicktime. The H264 video codec is very good, and it just so happens that Quicktime supports it.

  34. you gotta start somewhere, but they are far from being ideal…..at this point i think i can use my mouse and keyboard to work faster than the speech recongition ability of vista.

  35. Couldnt get it to work in Firefox, Firfox (IE Tab) or IE7 – Gave up. which is a shame because I’d have liked to view it.

    Quick time for the loose.

    Val.

  36. Pingback: BizToolbelt
  37. good luck with your viruses ^^. Quicktime rocks! Apple rocks! Microsoft rocks! Windows 95 rocks! BUT, XP sucks. (and also vista) LoL 😀

  38. Long Zheng does NOT work for Microsoft. Neither in the Speech Recognition group, nor anywhere else.
    How do I know?
    Well, I actually work for Microsoft, and I work for the Speech Recognition group. In fact, I work on the user experience team that built Windows Vista Speech Recognition.
    I really love what you’ve done here Long. The video is very cool. It shows a number of really cool things about Windows Vista Speech Recognition, and Vista itself. I use Vista Speech Recognition every day, and even I get excited about the video. As you pointed out, there are lots of additional features you didn’t show, like the dictation and correction experience, deeper keyboard manipulation, and most importantly, the interactive Speech tutorial.
    So, to address what one of the other posters said: You don’t have to learn a lot of keywords at all. What you need to learn, the tutorial teaches you. The tutorial is part of the first time user experience (when you start speech recognition). But if you boil it down, there are only a handful of things to learn:
    1. Say what you see commands: Say the name of anything you see on the screen, and the right thing happens. E.g. Say File, and the File menu drops down, say Save, and the Save button gets clicked, etc.
    2. Click what you see commands: Say click Bold, and the bold button gets clicked, Say double click Recycle Bin (when on the desktop) and the recycle bin gets double clicked.
    3. Show Numbers. Say Show Numbers, and all items in the current application or window get numbered, and you can interact with the numbers by either just saying them, or saying things like Double click 8, or Right click 7, etc.
    4. When you dictate, and the computer makes a mistake (accuracy is generally over 98%, and gets better over time, as you use correction), just say Correct , where word is the word the computer got wrong.
    5. What can I say?. Anytime you say that, Speech shows you a list of commands you can use.
    Oh, and for those of you who have access to Windows Vista beta builds, try saying “How do I install a printer” or “How do I change my desktop background”
    I’ll let you guys imagine what happens. Needless to say, it’s very cool!

  39. NICE. I had my doubts about Vista being “that much better” than XP. I can’t wait to do an install. After SP1, of course! I want to see how well this works with WMP. I want to be able to sit there and have it loop the best parts of what ever porno I’m watching! Well done Microsoft. Well done!

  40. Vista Speech Recognition (VSR) is “good”, on the same level or above a few other I have tried, but what is it “good” for?

    Dictation with the VSR still requires a very slow, unnatural and annoying (at least for me) speech rhythm to achieve the best performance but even it’s best performance still contains too many errors, requiring human revision which is very error prone in these situations. People tend to “read” what should be there and not what is there. In general dictation with VSR (and every other SR software I have tried) is still not up to the challenge.

    The other uses for SR, “commands, selecting alternatives, ’show numbers’, ‘mouse grid’, mouse functions, web browsing, and keyboard functions”, are done *far* more efficiently with a mouse and a keyboard.

    One thing that definitely needs improvement in VSR is the sound/speech buffer size. Too often VSR “forgets” part of what I said. I think this happens when the system can’t keep up and the sound buffer gets full. This is especially annoying when dictating. I have used other SR software that can “remember” a great deal of what I have said even if it takes some time to catch up.

    VSR should let the user configure the sound/speech buffer size. Also it should give a audible/visual warning so that the user can slowdown or pause and let the system catch up.

    So, most for computer users physically capable of using a mouse and keyboard (fortunately it’s the vast majority) will make little practical use of VSR or any other SR software. For people with disabilities, VSR can give them a very important alternative to the mouse/keyboard combo.

  41. Pingback: ZeroLogik
  42. One quick question: will the speach recognition feature be available in other languages, or just English?

  43. One quick question: will the speach recognition feature be available in other languages, or just English?

    It will ship with US English, UK English (I should probably use that), Chinese (Traditional & Simplified), Japanese, German, French and Spanish.

  44. Why do I have to say “Double click on…”?!?! Why I can’t say just “Open”?!?!
    sucks…

  45. Why do I have to say “Double click on…”?!?! Why I can’t say just “Open”?!?!

    You can.

  46. How much were you paid by Microsoft or that blog ad company to say these things?

    Not enough.

  47. Pingback: Rob's Rhapsody
  48. Pingback: Tech Talk Blog
  49. I saw this on Digg a few weeks back; I just dugg it out of the hoard to send a link to this video to a friend whose uncle is a paraplegic – guess why 😉
    (I mean guess why I dugg out the video link for him, not why his uncle is a paraplegic)
    Thanks for the video, man.

  50. Awesome! Esp the last part (typing out what you said) which is what I would use it for… omg, my report-writing time could be cut in half if it works! Don’t have time to read all the comments and I’m soooo not up to date on this type of stuff — off to do a search to see where (and if) I can get this program (is it possible to download it from somewhere and run it with windows XP)?

    Great work!

    Susan

  51. Pingback: GottaBeMobile.com
  52. Pingback: MasterMaq's Blog
  53. I am a Mac user and mac fan, but I think for once Windows has kicked ass.

    The latest release all having to do with Live etc. etc. are def. catching my attention.

    (No… I’m still sticking with Thunderbird, because Mail Desktop is prob no different to Outlook (except the design)

    This speech recognition will probably make Apple do a lot more work on their speech recognition (which is powerful, but not as strong as this!)

  54. stop complaining about QuickTime, there is nothing wrong with it and in fact the format used is very good and clear for the amount of compression used

  55. I can’t believe the majority of this forum is wasting time about movie file formats. Who cares. Anyone that will consider using voice rec in vista should be professional enough to work out how to play the video.

    Anyway, thanks for the demo and I am going out to buy a microphone today.

  56. Tight..!! i have been using speech recognition and find it fantastic..

    all u people out there may be having problems maybe because u do need a good quality microphone… thats for sure,..

  57. I’m a mac user…that was some awesome speech recognition there. I’ve tried the speech system on OSX and it kinda sorta really SUCKS. I’m still sticking with macs, but I might actually try Vista in BootCamp to check that out. Also I’m a sucker for shiney things, I keep getting tempted by Aero.

  58. For those complaining about Quicktime, at least he didn’t use RealPlayer.

    The demo is great. I will be trying this this evening as well. Now I need to find a way to get the computer to confirm my commands like the computer on the Enterprise 🙂

  59. Blah Quicktime format. Sorry but there is no such thing as a video that’s good enough to make someone want to install Quicktime. Other than not seeing the video – this is a good article.

  60. Great demonstration! Thanks for posting. Vista has something going for it after all 🙂

    I prefer Quicktime. It is a really solid format and at least it supports a number of codecs. It is the preferred by anyone who cares about quality playback. Anyone with an iPod already has it and it’s simple to install. H.264 is an open standard, unlike all those proprietary Microsoft codecs. H.264 It is very scalable and great for HD. Unlike Windows Media Player and Real, the scrubber in Quicktime actually works in a useful manner for both streamed and progressive downloads. Mac bashers forget that most of us don’t want to build a computer from scratch.

  61. Johny-D, linux is just about as bad as windows, you have to code most of your own drivers and such because no one does it for you. And if you don’t then your using a shitty computer and need to buy a better one. And for the video, very nice, will have to give it a try in a while.

  62. The Speech Recognition in vista is fine, though its more a function of how you speak, Speak well and it works well but deviate from your new reading voice and it’ll make a few mistakes in the dictation.

    If you want to try something different check out SpeakMediaPro and vive la difference, it’s not dictation but its a pretty cool way to play your media, the speech engine doesn’t require any training and can be used by anyone with an appropriate accent. http://www.speakmediapro.com

  63. Thus the speech recognition has lots of nice features for those who no longer wish to use a mass. At those who want to use it as a substitute for typing it’s absolutely terrible. Fort instance, I dictated this post using this the speech recognition. Can you tell what I’m trying to say? Data what age, this text does not reflect all the spurious highlighting that had to do with my mouse before I can complete the post.

    ————————–
    The above post was dictated with Vista Speech Recognition, trained to my voice. It was not corrected, as the correction process in VSR usually requires several prolonged bouts spelling the words out, and hoping it hears you correctly.

  64. Has anyone noticed how the French video doesn’t contain the use of the Speech Recognition program as you can hear him typing on his keyboard. Also the in Vista why the hell do you need two clocks. You have an analogue on the sidebar and the digital one on you’re task bar.

  65. Is it possible to get it to wirk with operatingsystems languages wich that dosent suppeort? How?
    I mine like, i got the Finnish language and it complains that it cant support the language, can i get it to work, i anyway need to command in english.

  66. ” IBM had the speech recognition working on 1994 hardware ”

    I have a speech recognition module for my Commodore 64 that I used all the time back in 1986. Just like this, you had to train it, but it still worked and contained lots of system control commands as well. The difference is the 1-second gap you had to leave between words and the fact that it was a hardware device that cost me $300.

  67. The Windows Vista speech recognition has some serious limitations, specially if you want to click anywhere on the screen.

    I settled upon an extension for Win Vista called Voice Finger ( http://voicefinger.cozendey.com ), that somehow fill the gaps in Win Vista recognition.

    I guess this software is not targeted to people who use speech recognition like an alternative from time to time, but if you want (or needs) to reduce computer contact to zero, this software is great.

Comments are closed.