none
Speeding up speech recognition? RRS feed

  • Question

  • I have been working with the speech recognition for some time now and have noticed that the recognizer seems to be very slow. It seems to have a hard time when the user says multiple words in a row. Also, sometimes with phrases eg: "Turn Off" it will take it a while to think about what was said. This sometimes seems to cause a two or three second delay before the "turn off" action is preformed.

    I know the xbox responds to commands much faster, this seems much slower then it should be. So I guess my question is: is there any way to speed up the recognizing of words?

    I suppose if I quit printing out to a debug screen "speech hypothesized", "speech recognize" it would speed it up some. But I doubt that slows it down that much.

    Any ideas?

    -Mike

    Friday, September 9, 2011 7:08 PM

Answers

  • Have you tried playing around with the "readStaleThreshold" in KinectAudioSource.Start(TimeSpan readStaleThreshold)? The default is 500ms. You could try making it smaller and see if you get better results. This parameter is described as:

    "If there are no reads to the stream for longer than this threshold the DMO discards any buffered audio. This prevents stale data from being returned in scenarios such as speech recognition and dialog systems, when the consumption of audio samples may stop for a while. Pass TimeSpan.MaxValue to avoid hitting this threshold."

    I know this is not exactly what you're asking for, but I can't think of any other setting you can configure in Kinect SDK Beta that will affect the delay you're seeing. Thanks for the feedback, though. I'll make note of it.

    Eddy


    I'm here to help
    Saturday, September 10, 2011 1:29 AM

All replies

  • Have you tried playing around with the "readStaleThreshold" in KinectAudioSource.Start(TimeSpan readStaleThreshold)? The default is 500ms. You could try making it smaller and see if you get better results. This parameter is described as:

    "If there are no reads to the stream for longer than this threshold the DMO discards any buffered audio. This prevents stale data from being returned in scenarios such as speech recognition and dialog systems, when the consumption of audio samples may stop for a while. Pass TimeSpan.MaxValue to avoid hitting this threshold."

    I know this is not exactly what you're asking for, but I can't think of any other setting you can configure in Kinect SDK Beta that will affect the delay you're seeing. Thanks for the feedback, though. I'll make note of it.

    Eddy


    I'm here to help
    Saturday, September 10, 2011 1:29 AM
  • Hi Mike,

    the first thing you need to ensure is that the slow app is really caused by the SR engine.

    Speech recognition speed it dependent of the complexity that you might have in your grammar.

    As you know, speech recognition uses Hidden Markov Models (Statistic Model) to determine the recognition result. But even for very big grammars the engine's speed is quite good. So I think that your problem might be outside of the SR process.

    Try to write some logs with timers to check how much time you're really spending in SR.

     

     

     


    Mário Vaz Henriques .NET Dev
    Sunday, September 11, 2011 9:39 PM