none
SpeechRecognised Latency RRS feed

  • Question

  • I'm using pretty much the standard Kinect voice recognition code pattern from the samples as per:

    using (sre = new SpeechRecognitionEngine(ri.Id))
            {
              var choices = new Choices();
              foreach (string choice in phrasesToCheck)
              {
                choices.Add(choice);
              }
    
              var gb = new GrammarBuilder();
              gb.Culture = ri.Culture;
              gb.Append(choices);
              var g = new Grammar(gb);
              sre.LoadGrammar(g);
              sre.SpeechRecognized += SreSpeechRecognized;
    
              using (Stream s = sensor.AudioSource.Start())
              {
                sre.SetInputToAudioStream(s,
                                          new SpeechAudioFormatInfo(
                                              EncodingFormat.Pcm, 16000, 16, 1,
                                              32000, 2, null));
                try
                {
                  sre.RecognizeAsync(RecognizeMode.Multiple);
                }
                catch (Exception e)
                {
                  return e.Message;
                }

    While this works, I've noticed that the delay in the SreSpeechRecognized event firing is MUCH LARGER in the Kinect for Windows 1.5.1 version, than the Kinect for Xbox Beta.  I've noticed latency of between 1 and 1.5 seconds in comparison to near instant recognition in the beta version.  Near instant recognition is what is required and on that premise, and the success of the beta, our development continues but smashes into this issue.  The latency found in the production version is too long, is there anything I can do about it?

    Do you need additional information?

    Windows7 (x64), 8Gb RAM

    The above code runs in an IIS web service (localhost) and the code line:    sre.RecognizeAsync(RecognizeMode.Multiple);   is called near instantly in debug scenarios.  The infrared light comes on for the sensor near immediately but the SpeechRecognized event only fires after a few seconds of delay, after immediately issuing voice commands.

    I'm getting 100% word/phrase recognition, but only if I wait a few extra seconds after the Kinect is initialised.  What's needed is INSTANT SpeechRecognized event firing after sre.RecognizeAsync(RecognizeMode.Multiple);  is executed.

    Any thoughts please?

    Remember please that the Beta worked fine, but the 1.5.1 version has too long a latency.  I'm using the exact same hardware, except the Kinect sensor is a commercial one (Kinect for Windows), and not the beta one from my kids Xbox (at least they are happy again :-) 



    Tuesday, July 17, 2012 5:13 AM

Answers

All replies

  • If you are truly getting 1-1.5 seconds on your recognized event, there is something your code causing this. I routinely get well under 1 second recognition times with a very, very complicated grammar set, so it's not the SDK/runtime.
    • Edited by ChrisCicc Tuesday, August 14, 2012 1:59 AM
    Tuesday, August 14, 2012 1:58 AM
  • I'm seeing latency, too, even when the grammar is just a few words.  Note that this same latency is in the promo video from MS for the "turtle" example.  This would be unusable in an app, imo.  Certainly there's a simple newb setting somewhere I'm missing!

    here's the video from MS that actually demonstrates the latency:

    http://www.microsoft.com/en-us/kinectforwindows/develop/tutorials/tutorialsdesc.aspx?tutorialid=Tutorial3


    • Edited by B F L Friday, September 21, 2012 3:30 PM
    Friday, September 21, 2012 3:06 PM
  • Are you referring to latency on startup, or latency after the application has been running for some time?

    Wednesday, September 26, 2012 6:30 AM