locked
The Evolution of Voice Recognition and Windows 8

    General discussion

  • As many know, on October 4, 2011, Apple unveiled its new smartphone, the iPhone 4S. This phone was only a minor upgrade from existing technologies, except for one thing: Siri voice technology. Based on the general reaction from users, touch-first technology such as Metro, when used on the desktop, is not a good idea. Metro will be great for tablets, but it feels unnecessary on Windows 8. Rather than try to capitalize on technology that has never been common on desktops, ideas like Apple's Siri open a whole new world of possibilities. Imagine a similar technology in Windows. Imagine microphones in every PC, where a user could just walk up to a computer and ask it to display something. For example, you could say, "How is Microsoft's stock doing?" and a stock viewer would automatically open to display MSFT stock prices. Or a user could say, "I need to write a memo," and Microsoft Word or a similar program could open and display a list of memo templates. Doesn't this sound like a better idea than holding your finger up to a screen in front of, and not below, your face? The only issue would be finding a source for or developing the voice technology. Imagine that you didn't even need to use a special microphone, you could just have a microphone integrated into your computer or monitor (some users could even place an external microphone on the desk for near hands-free interaction).

    I am aware that Windows already has speech recognition, but it is completely command-based and incapable of understanding natural speech. Improvements in speech recognition to the point where we could speak to a microphone built into our computers would be a lot more revolutionary that Metro/touch.

    Doesn't that sound better than holding your hand up to a screen or learning how to navigate Metro with a mouse? You could use a computer without even having to touch it.


    Monday, October 10, 2011 6:47 PM

All replies

  • This is an excellent idea! However, my view of its realization is next:

    1. A group of intelligent people gather with an idea to make a universal speech recognition / text to speech / universal translator software.

    2. A group creates a company and over time creates an perfected version of things mentioned above.

    3. Microsoft propose an offer to a group that they probably can't refuse, and the company comes under Microsoft wing. :)

    4. Speech recognition / text to speech / universal translator gets implemented in Windows _insert number/cool name here_.

    I think it's how the things roll these days.

     


    If you lack info, you get negative result by using positive strategy. Therefor, information is power.
    Monday, October 10, 2011 7:25 PM
  • There are already companies out there that offer this technology that Microsoft could by. Apple bought Siri for its next iPhone, so Microsoft could by another company for Windows 8.

    Step 1 in your list is basically complete, and Step 2 is coming close. I would like it if Step 3 could happen in time for Windows 8. I would much rather be able to talk to my computer than hold a hand up to my computer monitor.

    Monday, October 10, 2011 8:13 PM
  • I'm not particularly interested in voice technology. It is rare that I am actually in a situation where I would be comfortable speaking out loud to interact with my computer/device without disturbing others (coworkers, train car passengers, grocery shoppers, family members reading or watching TV, etc). Pretty much the only time it is useful is when I am driving, and then all I really need is to be able to look up directions or call someone, which my iPhone 3GS can already do just fine.
    Moderator | MCTS .NET 2.0 Web Applications | My Blog: http://www.commongenius.com
    Tuesday, October 11, 2011 2:47 PM
  • You're probably right, but I wanted to toss this out there as an idea of what Windows 8 could be. Remember, this is Apple who added Siri to the iPhone, not Google. There's a chance that this could be as big of a development as the iPhone touch UI, which was a major inspiration for Windows 8's design. Besides, I'd rather have Microsoft add this ability to Windows 8 than keep Metro.

    Tuesday, October 11, 2011 7:04 PM
  • It wouldn't take nearly that much work.  Windows has had speech recognition and text to speech built in since Vista.

     

    It'd still need the logic that Siri has to be nearly as useful, but the core technology is there.  If you need more proof of that, just look at Ford Sync, which is done by Microsoft.

     

    Also, I'll believe it's a revolution when I see it.  People still seem to feel awkward talking to their computer or phone, and I don't see that changing soon, though if anyone can do it, it's Apple.  And the best usage scenario for that kind of usage is when driving, but how often do you use a PC or tablet when driving?  On a phone I can understand it, but it doesn't make as much sense to me personally on a PC/tablet.  For Sync or Windows Phone though, sure.

    • Edited by JHoff80 Tuesday, October 11, 2011 8:12 PM
    Tuesday, October 11, 2011 7:45 PM
  • I didn't really mean to declare that a voice revolution was upon us. My main goal with this post was to offer Microsoft an excuse to drop Metro for desktops, and to point out that Apple has come out with a major new technology, one that was almost certainly influenced by Steve Jobs, and if the past is any indication, this means that the technology will become mainstream soon. Of course, it might not happen, but I had to toss out an idea that might be more compelling to Microsoft than Metro. When I described this, I was picturing something more like asking your computer a quick question as you walk by it, or using the computer as a kiosk in a working area that you don't have to actually touch in order to use. I certainly don't think using a computer with voice on a daily basis is a good idea, but then again, neither is uisng Metro on the desktop.
    Friday, October 14, 2011 1:44 AM
  • A great article outlining the problems with voice recognition + great links @ Whatever Happened to Voice Recognition?

     

    Friday, October 14, 2011 3:16 PM
  • Once again, I will point out that my idea wasn't intended for general use of the computer. Thus, speech recognition accuracy doesn't need to be above 80%. Besides, there's nothing wrong with keeping the "train profile" feature and letting users add words to the speech dictionary. The key point of this post was the "Instead of Touch" part at the top of the screen.


    Friday, October 14, 2011 8:44 PM
  • "Doesn't that sound better than holding your hand up to a screen or learning how to navigate Metro with a mouse?"

    "...my idea wasn't intended for general use of the computer. Thus, speech recognition accuracy doesn't need to be above 80%."

    These are perfect examples of armchair design. "Doesn't that sound better...?" just means "aren't i a good thinker?" and the arbitary conclusion you reach in the second would be like saying "look mom, no usability testing!".

     

    Monday, October 17, 2011 1:53 AM
  • i tried voice recognition software once but each time i tried dictating a message to my friend on msn and it got it completely wrong and typed out abusive swear words on the screen so now im not a big fan of that type of thing anymore.  my friend was having a problem with her computer and i said something like 'its probably just a glitch' and the voice recognition software sent 't*ts prostitute b*tch' instead, not very nice.

    to be honest i already shout at my computer quite a lot which drives my parents crazy and if i suddenly started having conversations with it they would have me committed for sure hehe.


    • Edited by Amy Gx Monday, October 17, 2011 8:27 AM
    Monday, October 17, 2011 8:24 AM
  • I'm not a designer. I don't claim to be an expert in user interface design, and I don't know a lot about voice recognition. "Doesn't that sound better...?" actually means "Don't you agree with me about how bad Metro is?" Of course, I understand if you disagree with me, but I stand by what I said. Again, this post isn't about me designing a new feature or knowing anything about voice recognition. I just wanted to toss out an idea that could convince Microsoft to scale back or abandon Metro.
    Monday, October 17, 2011 7:23 PM
  • I've been looking around online. Right now, it seems like all the tech news websites have a story about Siri on their front page. Somehow, I don't think that voice recognition will just be an unnecessary extra feature this time. This is an Apple product, not a Google one, and many of the ideas behind Siri are those of Steve Jobs. In my opinion, Windows needs a technology like this. Siri looks great, and I would enjoy seeing this kind of technology integrated into Windows 8. Sadly, it seems that Windows is embracing a touch-based computing model just as Apple launches a product with technology that may, in some areas, replace touch for certian kinds of tasks. For example, if you're at home and you want to know tomorrow's weather, it is easier to say "Computer! What will tomorrow's weather be?"

    Wednesday, October 19, 2011 12:34 AM
  • Remember Google's solution to the inappropriate words issue? It automatically replaced all bad words with "####." Good voice recognition software should always do this, in my opinion, unless the user really, really, really, really, really has a good reason to turn it off (writing a story that requires repeating something that someone else said, for example). Otherwise, voice recognition should never actually accept these words as input.
    Wednesday, October 19, 2011 12:37 AM
  • Otherwise, voice recognition should never actually accept these words as input.

    Uh, why not?  If I say a swear word, I want it to be recognized just as much as any other word.  The problem in that story is the bad recognition more than anything else.  Not sure how long ago that happened, but the technology is improving where hopefully in the near future that shouldn't ever be an issue.

    Wednesday, October 19, 2011 6:09 AM
  • Uh, why not?  If I say a swear word, I want it to be recognized just as much as any other word.  The problem in that story is the bad recognition more than anything else.  Not sure how long ago that happened, but the technology is improving where hopefully in the near future that shouldn't ever be an issue.

    I'm adamantly opposed to using any kind of swear word, without exception. If you use a swear word, the voice recognition software really shouldn't recognize it. After all, what if you are talking about something that sounds like a swear word but isn't the same thing? Also, it's just another way to help control the problem of bad online comments, forum posts, etc. Also, what if you misspeak and don't notice it? I've seen people accidentally say a bad word without ever trying to, they just mix up word parts and combine them into something that is bad, and depnding on what you're doing, you might not even notice it. Really, there's no reason to ever use swear words for anything, period.
    Wednesday, October 19, 2011 8:09 PM
  • Really, there's no reason to ever use swear words for anything, period.
    If someone expresses themselves in such a way, it would entirely be their choice.  If you have a problem with cursing that is fine, but don't try and force your views on what words can and can't be said or written down. Don't get me wrong, there is definitely a time a place for such things, I just happen to enjoy free speech and the rest of our rights that seem to be forgotten and taken for granted these days.
    Tuesday, December 06, 2011 7:34 PM