none
Detection of Multiple Voice Source? RRS feed

  • Question

  • Hi there,

    Does Kinect's microphone array can detect multiple voice source?

    For example, there are one people talking and one phone ringing,

    can Kinect detect both of two location?

    My program only can detect one source in this kind situation,

    so I want to know is it normal or not?

    Thanks for Answer.

    Ray


    • Edited by Rayar Tuesday, February 7, 2012 10:10 AM
    Tuesday, February 7, 2012 10:09 AM

Answers

  • Yes, what you are experiencing is correct.

    The Kinect will only expose one source endpoint to capture from. Depending on the configuration, the SDK can provide you information on the direction the captured audio is coming from. If you are recording audio for a prolonged period of time, that angle could change multiple times depending. Depending on how your applicaiton is written, you could use that angle as a way of processing the audio stream. Have you tried the AudioDemo sample with 2 people?

    Is there something in particular you are trying to do? Are you trying to distinguish different people based on volume/pitch of the voice?

    Wednesday, February 15, 2012 2:13 AM

All replies

  • Yes, what you are experiencing is correct.

    The Kinect will only expose one source endpoint to capture from. Depending on the configuration, the SDK can provide you information on the direction the captured audio is coming from. If you are recording audio for a prolonged period of time, that angle could change multiple times depending. Depending on how your applicaiton is written, you could use that angle as a way of processing the audio stream. Have you tried the AudioDemo sample with 2 people?

    Is there something in particular you are trying to do? Are you trying to distinguish different people based on volume/pitch of the voice?

    Wednesday, February 15, 2012 2:13 AM
  • Although he doesn't do the best job describing it, I think I know what he's getting at. A lot of us are trying to use Kinect voice capabilities in noisy environments. This could be from a TV/radio, HVAC system, or general gibberish from lots of people in the background. One common theme is that there is usually a speaker who is the party we are interested in, and Kinect doesn't always do a great job of focusing on the speech and ignoring the background noise. I have an issue with an HVAC system, really just a loud fan. My speech is clearly audible over the background noise, however, Kinect processes my speech with the noise included (with echo cancellation on as well). As a result, accuracy drops significantly. While the echo cancellation is doing a great job, I think there is a lot of room for improvement for removing sounds that clearly aren't human speech when it's running it's recognition algorithms.  
    Wednesday, February 15, 2012 8:00 PM