none
Kinect Sound Source Localization: variables that affect accuracy RRS feed

  • Question

  • Hi,

    I am actually researching for my final thesis, which is based on Kinect. Part of this research is to study the level of accuracy the device provides about the sound source location (angle) of an input sound in the range [-50, 50] in degrees. I found in the official documentation the operational distance ranges of the device (too close, normal, too far and unknown), but I think they are only applied for the video features (color, depth and skeletal tracking).

    What I need is some sort of chart representing the accuracy of the value of Sound Source Angle (represented with the confidence level) and/or Beam Angle based on the distance from the sensor, the volume of the input sound, and other variables that might affect this confidence level.

    My question is: What variables affect the accuracy of the sound source angle, aka confidence level and how do they affect it? Also, what are their values (operational distance, sound volume, etc.?

    Thanks!

    Tuesday, April 17, 2012 11:48 AM

Answers

  • Nagesh, I passed your question on to one of our principal researchers who did a bunch of the research  behind the Kinect audio stack.  I'm including his answers inline here.  If you send me a note directly at cwhitems@hotmail.com I will put you in touch with him directly for a more detailed followup.

    I am actually
    researching for my final thesis, which is based on Kinect. Part of this
    research is to study the level of accuracy the device provides about the sound
    source location (angle) of an input sound in the range [-50, 50] in degrees. I
    found in the official documentation the operational distance ranges of the
    device (too close, normal, too far and unknown), but I think they are only
    applied for the video features (color, depth and skeletal tracking).
    [Ivan Tashev]  Correct, the sound
    source localizer in Kinect can’t estimate the distance. You will have to map
    the angle to the depth image to estimate the distance.
    <o:p></o:p>

    What I need is some
    sort of chart representing the accuracy of the value of Sound Source Angle
    (represented with the confidence level) and/or Beam Angle based on the distance
    from the sensor, the volume of the input sound, and other variables that might affect
    this confidence level. <o:p></o:p>

    My question is: What
    variables affect the accuracy of the sound source angle, aka confidence level
    and how do they affect it? Also, what are their values (operational distance,
    sound volume, etc.?
    [Ivan Tashev]  The accuracy is affected by the level of the signal and the
    noise in the room (i.e. the signal-to-noise ratio) and by the reverberation in
    the room (higher RT60 – obviously lower accuracy). In normal noise conditions
    the sound source localizer should be able to track a person speaking with
    normal voice level in the operational range of the device, i.e. from one to
    four meters. We haven’t done any measurements beyond this interval. The angle
    range is +/-50 degrees.
    <o:p></o:p>

    -C
    Thursday, April 19, 2012 6:06 AM

All replies

  • Nagesh, I passed your question on to one of our principal researchers who did a bunch of the research  behind the Kinect audio stack.  I'm including his answers inline here.  If you send me a note directly at cwhitems@hotmail.com I will put you in touch with him directly for a more detailed followup.

    I am actually
    researching for my final thesis, which is based on Kinect. Part of this
    research is to study the level of accuracy the device provides about the sound
    source location (angle) of an input sound in the range [-50, 50] in degrees. I
    found in the official documentation the operational distance ranges of the
    device (too close, normal, too far and unknown), but I think they are only
    applied for the video features (color, depth and skeletal tracking).
    [Ivan Tashev]  Correct, the sound
    source localizer in Kinect can’t estimate the distance. You will have to map
    the angle to the depth image to estimate the distance.
    <o:p></o:p>

    What I need is some
    sort of chart representing the accuracy of the value of Sound Source Angle
    (represented with the confidence level) and/or Beam Angle based on the distance
    from the sensor, the volume of the input sound, and other variables that might affect
    this confidence level. <o:p></o:p>

    My question is: What
    variables affect the accuracy of the sound source angle, aka confidence level
    and how do they affect it? Also, what are their values (operational distance,
    sound volume, etc.?
    [Ivan Tashev]  The accuracy is affected by the level of the signal and the
    noise in the room (i.e. the signal-to-noise ratio) and by the reverberation in
    the room (higher RT60 – obviously lower accuracy). In normal noise conditions
    the sound source localizer should be able to track a person speaking with
    normal voice level in the operational range of the device, i.e. from one to
    four meters. We haven’t done any measurements beyond this interval. The angle
    range is +/-50 degrees.
    <o:p></o:p>

    -C
    Thursday, April 19, 2012 6:06 AM
  • Thanks a lot Chris and Ivan!

    That really helped.

    Friday, April 20, 2012 8:03 AM
  • Dear Chris White,

    I am researching on Kinect too.

    The question is how does Kinect conduct sound source localization? on circuit?

    and what is the algorithm of source localization?

    Thank you.

    Wednesday, July 11, 2012 9:06 AM