none
Kinect audio problem RRS feed

  • Question

  • Hi

    I am currently working on speech recognition using kinect. I am facing some problems. Please kindly suggest with solutions.

     I have declared two sets of grammar one for start and stop commands and other Select Commands

    start and stop--"Kinect go to"

    Select Commands--"Screen1" , "screen2", "Screen3" , .............

    When I start the program I load only 'start and stop commands' so that it will not get confused for the speech and thereafter when I say "kinect go to" I load 'select commands' and I unload all the grammar except 'start and stop commands'. But if the speaker is explaining something then also it is recognising that speech as "kinect go to" and it is loading the remaining grammar and performing the remaining operation for 'select commands'.

    how can we make kinect device to stop listening to the speeker when it gets a stop command?

     

     

    Monday, December 12, 2011 11:40 AM

Answers

  • Msat,

    There are a couple of things you need to do.

    1) Enable Acoustic Echo Cancellation on the speaker where sounds are coming from. This is most often the speaker at index 0 (relative to the MMDevice active speaker device enumeration):

    KinectAudioSource source = new KinectAudioSource();
    source.SystemMode = SystemMode.OptibeamArrayAndAec;
    source.SpeakerIndex = 0;
    
    Stream stream = source.Start();
    ...

    2) Since the sound coming out of the speakers is not a continuously playing sound, such as music, you will need to play silence in a loop so that KinectAudioSource keeps capturing audio rather than giving you an error. This is a workaround to a current limitation in KinectAudioSource. You can play sounds using System.Media.SoundPlayer.

    We are working to make this configuration simpler in future versions of the SDK, but if you follow the suggestions above using Beta 2 release, it should work for you. Also look at related post http://social.msdn.microsoft.com/Forums/en-US/kinectsdkaudioapi/thread/f184a652-a63f-4c72-a807-f9770fdf57f8 for a related discussion.

    Hope this helps,
    Eddy


    I'm here to help
    Wednesday, December 14, 2011 2:36 AM
  • You may also be running into something called "False activation" where the recognizer *thinks* that you said the command, even when you didn't.  There are a couple of approaches that you can take to mitigate this. 

    The first is that you can dial up the confidence requirement.  In you handler for SpeechRecognized, there is a value e.Result.Confidence.  This is basically a measure of how confident the recognizer is that it was REALLY the thing that the person said.  You can tune your application by checking this value, and rejecting things w/ low confidence.

    e.g.:

    private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
      const double ConfidenceThreshold = 0.3; // EXPERIMENT WITH THIS
      
      if (e.Result.Confidence > ConfidenceThreshold)
      {
         ;// Do interesting things here
      }
    }

    You can also add a set of "honeypot" words which are basically random words to your base grammar.  Don't do anything if these are recognized... This will also help mitigate the false activation problem.  Generally, I doubt you'll have to go down the honeypot route if you tune the confidence threshold.
    Tuesday, April 3, 2012 6:44 PM

All replies

  • Msat,

    There are a couple of things you need to do.

    1) Enable Acoustic Echo Cancellation on the speaker where sounds are coming from. This is most often the speaker at index 0 (relative to the MMDevice active speaker device enumeration):

    KinectAudioSource source = new KinectAudioSource();
    source.SystemMode = SystemMode.OptibeamArrayAndAec;
    source.SpeakerIndex = 0;
    
    Stream stream = source.Start();
    ...

    2) Since the sound coming out of the speakers is not a continuously playing sound, such as music, you will need to play silence in a loop so that KinectAudioSource keeps capturing audio rather than giving you an error. This is a workaround to a current limitation in KinectAudioSource. You can play sounds using System.Media.SoundPlayer.

    We are working to make this configuration simpler in future versions of the SDK, but if you follow the suggestions above using Beta 2 release, it should work for you. Also look at related post http://social.msdn.microsoft.com/Forums/en-US/kinectsdkaudioapi/thread/f184a652-a63f-4c72-a807-f9770fdf57f8 for a related discussion.

    Hope this helps,
    Eddy


    I'm here to help
    Wednesday, December 14, 2011 2:36 AM
  • Hi,

    Thanks for your reply.

    I am not talking about the sound coming from speakers. I am talking about the voioce of a person who is speaking.

    Wednesday, December 14, 2011 9:24 AM
  • You may also be running into something called "False activation" where the recognizer *thinks* that you said the command, even when you didn't.  There are a couple of approaches that you can take to mitigate this. 

    The first is that you can dial up the confidence requirement.  In you handler for SpeechRecognized, there is a value e.Result.Confidence.  This is basically a measure of how confident the recognizer is that it was REALLY the thing that the person said.  You can tune your application by checking this value, and rejecting things w/ low confidence.

    e.g.:

    private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
      const double ConfidenceThreshold = 0.3; // EXPERIMENT WITH THIS
      
      if (e.Result.Confidence > ConfidenceThreshold)
      {
         ;// Do interesting things here
      }
    }

    You can also add a set of "honeypot" words which are basically random words to your base grammar.  Don't do anything if these are recognized... This will also help mitigate the false activation problem.  Generally, I doubt you'll have to go down the honeypot route if you tune the confidence threshold.
    Tuesday, April 3, 2012 6:44 PM