Kinect Runtime Language Packs vs Microsoft Server Speech Language Packs RRS feed

  • Question

  • Hello,

    I am currently developing an application which involves speech recognition in a managed environment.

    It is nothing complicated, just listening for a single word. Unfortunately, I do need a language pack for German language.
    Now I have downloaded the  MSSpeech_SR_de-DE_TELE.msi from as a surrogate.

    In my application I have a large string-array of about 500 words. Sometimes the SpeechRecognitionEngine will do fine and recognize words with a confidence of more than 80%. Yet, there are some words whose confidence is as low as .1% and therefore will be rejected.

    Is there a major difference between using the Kinect language packs and the  MSSpeech_SR packs?
    Is it a major issue, that I have more than 500 words to be checked/compared by the speech recognition engine where many words a short and some sound alike?

    Thank you for your help.

    Regards, koenig1985 

    Monday, September 10, 2012 12:21 PM


  • 1) There is a major difference between the Kinect Language packs and the MSSpeech_SR packs... The Kinect packs are tuned to the acoustic response of the kinect microphone (which takes into account artifacts related to beam forming, noise suppression, echo cancellation, etc... For a given language, you should expect a significantly higher recognition rate w/ the Kinect pack than with the corresponding standard language pack. 

    2) Yes, it is an issue that there are many words, and that many of them are short words that sound alike.  The recognition technology mostly depends on length of words and number of syllables.  Short sound alike words with similar numbers of syllables and cadences will always be very hard to distinguish.  

    3) We are aware of the desire for Kinect specific German language recognition, and it is high on our priority list.  Stay tuned for more information.

    Monday, September 10, 2012 4:20 PM