Some questions about kinect speech recognition RRS feed

  • Question

  • Hi all,

    I got some questions about kinect speech recognition engine.

    1. Are there any API can increase receive volume for improving speech recognition accuracy?

    2. Does this helpful to get beamAngle for improving speech recognition accuracy?

    3. Sometimes the speech recognition engine will recognize some words with quiet background, what the reason?

    4. With noise background, how can i improve the speech recognition accuracy?

    Hope someone can kindly help to answer this question.

    Thank you

    Wednesday, April 25, 2012 8:33 AM


  • 1) The volume control in windows for the Kinect USB audio device will control the input volume.  The recommendation that we give in the release note is that it be set to "3", which corresponds most closely to a 0dB gain (no amplifiction).  By default, windows 7 sets the gain to "100" corresponding to ~ 30dB of amplification.  This can both lead to quiet sounds being recognized incorrectly as speech, and clipping in the audio signal preventing recognition of real speech.  I'd start from "3".  It may be that a small positive value (between 3 & 10) will do better in some environments.  As always, test in your environment and tune the system for where and how it will be used.  You want to make sure that the audio stream is never getting clipped.  This setting can be accessed by going to the sound control panel in windows, clicking on the recording tab, and then double clicking on the Array microphone that is part of the Kinect.

    2) Beam angle is used internally in the Optibeam array mode to point the beam where the loudest sound is coming from.  This will tend to improve accuracy, because we'll be listening to the speaker preferentially over the background noise.

    3) This is likely the result of a high input gain setting in windows.  The other thing to look at is the recognition confidence... I'd expect it to be low in this scenario.  In your code, when you get a recognition event, you should check the confidence level of the result before acting on it.  Do some testing to find a threshhold that gives a good recognition to false activation ratio based on your specific grammar and acoustic environment

    4) Background noise is really hard.  Do whatever you can to remove background noise.  If the noise is from some fixed souce, be sure to point the Kinect in the opposite direction, to take advantage of the directionality of the mics in the mic array, and to take advantage of the damping in the housing.  If your system will be used in an environment with background noise, be sure to always provide an alternate control mechanism in case speach does not work because of high ambient noise.

    Tuesday, May 1, 2012 4:54 PM