Active skeleton and speech RRS feed

  • Question

  • hi all,

    we are using Kinect in a more or less noisy or crowdy environment. We would like to use also speech recognition - but this recognition should be related to the skeleton that is tracked - so that we can ignore other voice signal and noises.

    Is this possible and it it is how? What is you ADVICE?

    How are you guys handling this functionality - what are you best practices?




    Tuesday, March 3, 2015 9:15 AM

All replies

  • I have not used Speech Recognition with Kinect v2 yet, but I did on v1 and results in environments with multiple speakers were very poor. In my tests, recognition success fell dramatically when some people were talking near the Kinect user. I would say other kind of noises are less critical.

    For the case you have several Bodies tracked and you want to know which is talking (when only one did so), there is a recent post hinting to how to use AudioBeamSubFrame information.

    Tuesday, March 3, 2015 12:56 PM
  • In a crowded area speech is going to be an issue since you need to calibrate to negate the ambient noise. AEC type technology can cancel help with this since it is aware of what the system is outputting and can determine the noise based on the noise that is picked up. For Kinect for Windows v2, that type of filter is not available, so you will need to find/develop a component that will work to filter some of that noise out.

    Beamforming will only provide a direction of noise not specifically speech. In a speech system, you can deduce that noise is someone talking based on body tracking. Whether or not you speech will work in a noisy environment is external to what a microphone technology can do.

    Carmine Sirignano - MSFT

    Tuesday, March 3, 2015 6:17 PM