none
Determining the position of a hand clap or other percussion sound

    Question

  • I've been researching the Kinect SDK Beta for the past couple of days and have been playing around with the SoundSourceLocalizer.

    I'm looking for a way to determine the position of a percussion like sound(hand clap, ticking against a glass or on a table). It seems that the SoundSourceLocalizer is not very good at figuring out the position of these short sounds in comparison to words or sentences.

    What kinds of sounds are easier for Kinect to determine the position of?

    Is it only capable of detecting the position of speech or is it also able to determine the position of short sounds?

     

    on another note, is it possible to create 2 DMO's and set 2 seperate beams? since the beamforming is performed in software as a post effect it should be possible right? and if so, will there be significant differences between the output of these DMO's when for example setting 2 beams on the far sides (-30 degrees and +30 degrees) and making a sound in one of the beams?

    Thursday, September 15, 2011 10:41 AM

Answers

  • the sound source localizer is optimized for speech-like sounds, so that's why it's better with those than with short, percussive sounds.

    Using 2 DMOs associated with the same Kinect Sound source is not a scenario that has been tested for the Beta, so I won't make any promises. That being said, the audible effect of aiming the beam is very small. It is significant enough for speech recognition in that the reverb for sound coming from the aimed direction will be reduced (thus cleaning up sound), but hard to hear in recordings.

    Eddy


    I'm here to help
    Friday, September 16, 2011 12:17 AM

All replies

  • the sound source localizer is optimized for speech-like sounds, so that's why it's better with those than with short, percussive sounds.

    Using 2 DMOs associated with the same Kinect Sound source is not a scenario that has been tested for the Beta, so I won't make any promises. That being said, the audible effect of aiming the beam is very small. It is significant enough for speech recognition in that the reverb for sound coming from the aimed direction will be reduced (thus cleaning up sound), but hard to hear in recordings.

    Eddy


    I'm here to help
    Friday, September 16, 2011 12:17 AM
  • Concerning using the 2 DMOs, I still have couple of questions.

    It seems that when I set up 2 DMOs on the same capturing device, it seems that when I do ProcessOutput on the first the second will not have any data to process. From this I take that the first DMO clears the input buffer after ProcessOutput.

    I did see that it is possible to set the DMO to filter mode, this way I am allowed to set the input buffer of this DMO and use it as a filter instead of a filter and capturer.

    My plan now is to capture the audio in 4 separate channels and push this to the 2 DMOs.

    What would be the expected way to set up this system? Using the WASAPI to capture and pass this on to the DMOs or is it possible to capture 4 channels with a third DMO and pass that on to the other ones or is there another way that is preferred?


    • Edited by DrEvilD Tuesday, September 20, 2011 8:48 AM
    Tuesday, September 20, 2011 8:35 AM
  • I have in the meantime done some research and implemented dual beam forming.

    The results are better than expected, there is a visual difference in the waveform and when setting the 2 recordings to the left and right channels of a stereo track the difference in position can be heard when listening through headphones.

    Are there any parameters that can be set on the DMO that are now optimized for speech but could be set differently to optimize it for shorter, percussion like sounds?

     

    I did encounter some problems during the implementation, SoundSourceLocalizer::SetBeam returned E_FAIL every time, even though I did turn on feature mode and mic array mode to extern. Using the property store to set the beam angle did work.

    I also found out that turning off noise suppression yield far more noticeable results with 2 beams and improves the accuracy of the SoundSourceLocalizers position detection results(when using automatic beam).

     

    Thursday, September 22, 2011 8:19 AM
  • I'm glad to see you made interesting progress on this! There are unfortunately no options available to optimize the DMO audio processing for percussive sounds.

    Thanks also for the error report on SoundSourceLocalizer::SetBeam.

    Eddy


    I'm here to help
    Wednesday, September 28, 2011 9:39 PM
  • Actually, when you try to call SetBeam, did you make sure you have MFPKEY_WMAAECMA_FEATR_MICARR_MODE set to MICARRAY_EXTERN_BEAM? Otherwise I don't expect this to work properly.

    Eddy


    I'm here to help
    Wednesday, September 28, 2011 9:45 PM
  • yes I did set MFPKEY_WMAAECMA_FEATR_MICARR_MODE  to MICARRAY_EXTERN_BEAM...

    setting MFPKEY_WMAAECMA_FEATR_MICARR_BEAM did not cause any problems....

    Thursday, September 29, 2011 7:40 AM
  • Ah, I think I know what it is. You need to call IMediaObject::AllocateStreamingResources after setting the MICARR_MODE to MICARRAY_EXTERN_BEAM and before calling ISoundSourceLocalizer::SetBeam.

    I can see that this is not intuitive, so I will report problem. Thanks for the feedback!
    Eddy


    I'm here to help
    Monday, October 03, 2011 10:13 PM
  • I have in the meantime done some research and implemented dual beam forming.

    The results are better than expected, there is a visual difference in the waveform and when setting the 2 recordings to the left and right channels of a stereo track the difference in position can be heard when listening through headphones.

    Would you mind elaborating how you are doing that? I am trying to initialize two or more different DMOs without any success.. :(

    Thursday, March 01, 2012 11:06 PM