none
Advanced audio capabilties of Kinect and Speech Platform RRS feed

  • Question

  • First I will probably utilizes the System.Speech instead of the Microsoft.Speech namespace used in the Kinect examples.  I created the following thread concerning this decision: (http://social.msdn.microsoft.com/Forums/en-AU/kinectsdkaudioapi/thread/e2801bcd-30b7-4982-ba56-2e1b244593b6).  In either case I have the following questions about pushing the limits of Kinect/.NET speech recognition.

    1)  Various videos indicate one must disable noise cancellation to use speech recognition.  Given a noisy environment is there not a method to use this in conjunction with speech recognition.

    2)  I have a situation where the PC is playing Audio (Music) while speech recognition is occurring.  Does someone know an advanced technique of filtering out the Music from the incoming Kinnect Microphone Audio?  Basically I am looking for a DSP filtering mechanism like [To Speech Recognition = Kinect Microphone - Music].

    3) Can I somehow make use of the provided Kinect Acoustic Model (Kinect for Windows Runtime Language Pack, version 0.9) in the System.Speech namespace?

    Thanks,
    aidesigner

    Thursday, July 21, 2011 7:34 PM

Answers

  • 2) So your response seems to indicate an audio stream (Music) must be sent from the PC to Kinect for AEC functionality?

    [Eddy] Yes

    2a) What if the audio source is an external audio source (ie. Stereo), that can be connected to the PC soundcard input. I assume I can still direct the audio to AEC using KinectAudioSource.<?> somehow?

    [Eddy] Currently it is only possible to configure AEC from a speaker output, so you would have to redirect your soundcard input into some output to then redirect this to Kinect Audio SDK. I can see how this would be inconvenient, so I'll record your feedback.

    Thanks for your feedback!
    Eddy


    I'm here to help
    • Marked as answer by aidesigner Saturday, July 23, 2011 11:56 AM
    Friday, July 22, 2011 8:44 PM

All replies

  • aidesigner,

    1) No, we recommend disabling Automatic Gain Control, but not noise suppression. You can see the WPFShapesGame doesn’t disable noise suppression. Can you send me a reference to the videos that recommend this? 

    2) This is what Acoustic Echo Cancellation (AEC) is for. You need to turn it on by setting KinectAudioSource.SystemMode to SystemMode.OptibeamArrayAndAec and then set the KinectAudioSource.SpeakerIndex to the desired speaker device index (likely 0, unless you have multiple speaker devices). The main gotcha with this is that AEC processing requires that there must be audio playing at all times. One way around this is to have the app continuously play silence when no music is playing. Let me know if you run into any issues with this.

    3) No, There is no Kinect Acoustic Model available for System.Speech namespace. Sorry about that.

    Eddy


    I'm here to help
    Thursday, July 21, 2011 10:49 PM
  • 1) I watched the quickstart videos again and it only said disable Automatic Gain Control.

    2) My understanding is that the kinect audio mics themselves have a DSP for noise reduction/echo cancellation (AEC).  So your response seems to indicate an audio stream (Music) must be sent from the PC to Kinect for AEC functionality?

    2a) What if the audio source is an external audio source (ie. Stereo), that can be connected to the PC soundcard input.  I assume I can still direct the audio to AEC using KinectAudioSource.<?> somehow?

    Thanks,
    aidesigner
    Friday, July 22, 2011 3:57 PM
  • 2) So your response seems to indicate an audio stream (Music) must be sent from the PC to Kinect for AEC functionality?

    [Eddy] Yes

    2a) What if the audio source is an external audio source (ie. Stereo), that can be connected to the PC soundcard input. I assume I can still direct the audio to AEC using KinectAudioSource.<?> somehow?

    [Eddy] Currently it is only possible to configure AEC from a speaker output, so you would have to redirect your soundcard input into some output to then redirect this to Kinect Audio SDK. I can see how this would be inconvenient, so I'll record your feedback.

    Thanks for your feedback!
    Eddy


    I'm here to help
    • Marked as answer by aidesigner Saturday, July 23, 2011 11:56 AM
    Friday, July 22, 2011 8:44 PM
  • Eddy, would 2 work the same with audio from Media Center? would we want to continously play silence when something is playing in media center, or detect when MC is producing no audio? Do you have a code sample for the playback of silence? Thanks!!
    Monday, August 29, 2011 3:15 PM
  • Sure, if audio from Media Center is coming out of PC speakers, then you can use it as input to AEC. Why would you want to play silence, though, when audio is already playing? or do you mean only play silence when MC is producing no audio, to avoid problems in AEC processing?

    <Edit>I guess if you mean play silence from application so that this silence signal gets blended into the PC speaker output together with the not-always-playing audio from media center, then that should work pretty well</edit>

    I don't have a code sample for playing back silence, unfortunately. Sorry!

    Eddy


    I'm here to help
    Tuesday, August 30, 2011 8:49 PM
  • Thanks Eddy. I'm going to give this a try tonight. Got the AEC working last night, but the issues with no moments of no audio cropped up.

    I recorded some samples of AEC with me talking in the background and I was VERY impressed. I had my music well above listening (or even party) levels, and spoke at the same volume I normally would to the Kinect. Playing back the audio it sounded like the music was playing softly in the background, and my speech was clearly audible. Very impressive stuff. 

    Tuesday, August 30, 2011 10:34 PM
  • I was able to find a 10 second long silent WAV file, which I'm looping. But it crossed my mind...what about the millisecond in between loops? Could that cause errors? Perhaps I should should be looping two ten second clips at five seconds apart?
    Thursday, September 1, 2011 7:35 PM
  • Are you playing silent wav because it is difficult to determine when Media Center is not producing sound and then disabling AEC?

    Would you consider sending/posting the relevant code for operating the AEC under your conditions.  It would be helpful as I need to perform similar tasks.


    aidesigner
    Thursday, September 1, 2011 9:12 PM
  • Are you playing silent wav because it is difficult to determine when Media Center is not producing sound and then disabling AEC?

    Would you consider sending/posting the relevant code for operating the AEC under your conditions.  It would be helpful as I need to perform similar tasks.

    Sure thing. This is the code from my Main method in Program.cs:

    const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
    
        static void Main(string[] args)
        {
          SoundPlayer simpleSound = new SoundPlayer(@"resources\silence10s.wav");
          simpleSound.PlayLooping();
    
          using (var source = new KinectAudioSource())
          {
            source.FeatureMode = true;
            source.AutomaticGainControl = false;
            source.SystemMode = SystemMode.OptibeamArrayAndAec;
            source.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam;
            source.NoiseSuppression = true;
            source.AcousticEchoSuppression = 0;
            short micIndex = -1;
            foreach (AudioDeviceInfo info in source.FindCaptureDevices())
            {
              micIndex = (short)info.DeviceIndex;
              break;
            }
            source.MicrophoneIndex = micIndex;
            source.SpeakerIndex = 0;
    
            RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
    
            if (ri == null)
            {
              Console.WriteLine("Could not find speech recognizer: {0}. Please refer to the sample requirements.", RecognizerId);
              return;
            }
            
            Console.WriteLine("Using: {0}", ri.Name);
    
            using (var sre = new SpeechRecognitionEngine(ri.Id))
            {
              sre.LoadGrammar(BuildGrammar());
              sre.LoadGrammar(BuildGrammarForSpecialCommands());
              sre.SpeechRecognized += SreSpeechRecognized;
              sre.SpeechHypothesized += SreSpeechHypothesized;
              sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
    
              using (Stream s = source.Start())
              {
                sre.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
    
                Console.WriteLine("Recognizing. Say command to control XXXX. Press ENTER to stop");
    
                sre.RecognizeAsync(RecognizeMode.Multiple);
                Console.ReadLine();
                Console.WriteLine("Stopping recognizer ...");
                sre.RecognizeAsyncStop();
              }
            }
          }
        }
    

    The code for AEC I got from these forums. I chose the silent WAV becaue I was concerned about the practicality of detecting audio. If it was so easy, why isn't built in from the start? Maybe it will be in the future, not sure, but I found the WAV to be the path of least resistance for now.

     

    Thursday, September 1, 2011 10:37 PM