none
Speech and Kinect RRS feed

  • Question

  • We have a medical enterprise application that relies heavily on handwriting, handwriting recognition and also uses speech and speech recognition plus touch. Now I don't get this "uninstall any Microsoft Speech runtime components" to install the Kinect SDK - ??? - surely this is crazy! And how does it affect handwriting recognition as it uses the same recognition engine??? Basically gut Windows 7 first to install ??? What about touch and digitizer monitor drivers? I'm assuming we can reinstall handwriting components again? What components are the problem here - give me a list so we can fix them! Or devise a better work around.
    Monday, February 6, 2012 6:33 PM

Answers

  • There is no need to uninstall any base windows components to install and use the Kinect 4 Windows SDK.

    The Microsoft.Speech namespace corresponds to the Server Speech engine, while the System.Speech namespace corresponds to the desktop engine. Either engine can accept input from Kinect, and they are entirely separate products, so should not affect each other at all.

    The language packs which ship with Kinect 4 Windows run in the context of the Server speech engine, so if you want to leverage the models trained specifically on the Kinect Audio pipeline, that is the correct one to use.

    Currently, the Kinect 4 Windows language models are tuned for relatively small grammars, and don't support straight dictation, so some use cases will be likely be better served by leveraging the built in speech recognizer. This will come at a slight tradeoff in accuracy, as those models have not been specifically trained for the Kinect audio pipeline, and some of the processing that we do (beam-forming, noise suppression, echo cancellation) introduce slight artifacts into the signal.

    The note about uninstalling previous versions of the speech engine was specifically targeted at the Server speech engine.

    Thursday, March 22, 2012 9:08 PM

All replies

  •  

    In the previous betas, we relied on Speech v10.x technologies:

    • Speech Server Runtime x86, x64 - need v11, v10.x can't be on the same machine.
    • Speech Server SDK x86, x64 - need v11, v10.x can't be on the same machine
    • English language pack for Speech SErver Runtime for Kinect - (don't have precise wording of Add/Remove Program entrie)

    We now use, and help you install the Runtimes..but need to install v11.

    Does that help?

    Thx, Rob

     

    Monday, February 6, 2012 6:59 PM
  • No it does not help, obviously you do not have experience with the recognition technologies in Windows. When you say " "uninstall any Microsoft Speech runtime components" I assume now you mean "Speech Server" which is different from the built in Windows OS Speech Recognition Engine.

    I have to use the native SRE in the OS - otherwise it is gutting the OS for real speech and handwriting and downgrading the app to unuseable.

    Can I use Kinect and the runtimes WITHOUT the Speech Server and just the motion detection without screwing up the native OS handwriting, touch and speech recognition.

    I haven't got a Kinect for Windows yet, but I do not want to spend a lot of money and time by giving this to a programming team if the SDK doesn't imagine a handwriting, touch and native MS OS speech application with Kinect motion detection. Otherwise Kinect is a dead-end with a single purpose horizon for development.   

    Tuesday, February 7, 2012 1:01 PM
  • I am trying to help you. I didn't say "uninstall any MS Speech runtime components."

    I said ... uninstall the speech server runtime, sdk and the english language pack. My belief is that doesn't change the native OS facilities for handwriting, touch, and speech.

    Is that mistaken? Does that solve your problem? Have you tried it?

    Sorry you are upset. Please work with me as a person who is trying to help you.

    Thanks, Rob Relyea | Kinect for Windows team.

    Tuesday, February 7, 2012 6:55 PM
  • Actually, I found more info on another thread. Seems there is a lack of co-ordination in development.

    From what I have gleaned form other threads it seems there are two different speech systems. The Microsoft.speech.recognition is the Server 2003/8 version of SAPI and the System.speech is the desktop version. Basically, the M.S version has SAPI but no recognizer engine. Therefore no dictation or true recognition but only command recognition - a very limited engine. The S.S can recognize free text and commands.

    For application development on Windows M.S is useless except for games or demo business apps - it is too limited for business or productivity products.

    The M.S is a low level speech engine for lower quality audio, generic voice and server not client deployment. Serious apps handle higher levels of recognition and need higher quality audio for the discernment of free text with commands mixed in. The S.S can do this as it includes dictionaries and grammars in the algorithm. M.S does not and can not.

    Others seem to say it is theorical possible you can use the S.S instead but the "mouse control" needs the M.S (?) So I am back to installing both M.S and S.S and seeing how well they play together. Probably the road is turning off everything in M.S but developing a way to transfer the motion sensing over to Tablet PC handwriting and voice controls. 

    M.S is really best for systems like On Star and simple tasks. Why anyone thought it is a development medium for anything commercial other than maybe on-line games is beyond me. Even there, the S.S is far superior. It would have made a whole lot more sense to adapt the Kinect to the better speech instead of using downgraded recognition! (this drives me crazy - do you think Steve Jobs would have made this mistake - I hate Apple.) 

    So  when you say "uninstall the speech server runtime" you do not really mean the S.S do you, but previous versions of M.S Any idea what happens in trying to run both S.S and M.S on the same machine?

      

    Wednesday, February 8, 2012 7:40 PM
  • By the way, I have a little bit of a no nonsense personality, a bit grawf. I push but I also listen - I'm not upset - just arrogant and pushy, but I really have no ego - you can call me an idiot if you wish, as long as we progress.
    Wednesday, February 8, 2012 7:49 PM
  • Come on those working at MS - what happens in trying to run both S.S and M.S on the same machine? Don't make me spend money and time on what should be a simple MS tech statement.
    Thursday, March 22, 2012 5:14 PM
  • Kinect should be recoginized as a mic already so using any speech recogition software would be a choice if ms's is too confusing. I personally am OCD when it comes to my projects and like them to be nice and clean and work fluid all the way through and meet guidelines people set for the project.

    Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth. - "Sherlock holmes" "speak softly and carry a big stick" - theodore roosevelt. Fear leads to anger, anger leads to hate, hate leads to suffering - Yoda


    • Edited by The Thinker Thursday, March 22, 2012 7:26 PM
    Thursday, March 22, 2012 7:25 PM
  • There is no need to uninstall any base windows components to install and use the Kinect 4 Windows SDK.

    The Microsoft.Speech namespace corresponds to the Server Speech engine, while the System.Speech namespace corresponds to the desktop engine. Either engine can accept input from Kinect, and they are entirely separate products, so should not affect each other at all.

    The language packs which ship with Kinect 4 Windows run in the context of the Server speech engine, so if you want to leverage the models trained specifically on the Kinect Audio pipeline, that is the correct one to use.

    Currently, the Kinect 4 Windows language models are tuned for relatively small grammars, and don't support straight dictation, so some use cases will be likely be better served by leveraging the built in speech recognizer. This will come at a slight tradeoff in accuracy, as those models have not been specifically trained for the Kinect audio pipeline, and some of the processing that we do (beam-forming, noise suppression, echo cancellation) introduce slight artifacts into the signal.

    The note about uninstalling previous versions of the speech engine was specifically targeted at the Server speech engine.

    Thursday, March 22, 2012 9:08 PM