none
Speech Dictation with Kinect v2 RRS feed

  • Question

  • Dear Gurus of the Kinect,

    I would like to append a question that was asked for Kinect 1: Can I use the speech recognition in dictation mode with Kinect v2? Here's the original answer:

    https://social.msdn.microsoft.com/Forums/windowsapps/en-US/b1d796fa-bd22-424e-9633-c6733aadc613/kinect-and-dictation?forum=kinectsdk

    I started off by modifying the Speech Basics sample, where I replaced

    hr = m_pSpeechGrammar->LoadCmdFromFile(GrammarFileName, SPLO_STATIC);

    by

    hr = m_pSpeechGrammar->LoadDictation(NULL, SPLO_STATIC);

    which seemed the most reasonable thing, following the scarce documentation on SAPI 5.4. However, that assignment returns FAILED(hr). So probably the Kinect SDK still doesn't support the dictation mode? I wouldn't like to miss out on the sound localization capabilities, which I would have to when using System.Speech.

    Thank you already, and have a wonderful Christmas!

    Wednesday, December 17, 2014 4:43 PM

Answers

  • Kinect support in the Windows speech pipeline(SAPI) is mainly just a source of audio data. The internals of how speech uses Kinect would be no different than using another microphone. The main difference is Kinect audio is very specific(mono 32-bit IEEE floating point PCM stream sampled at 16 kHz, typical PCM values will be between -1 and +1) which may not be directly supported with it or other audio pipelines expecting 44kHz, 16bit, 2 channel audio. The SpeechBasics-D2D provides a KinectAudioStream wrapper that makes the sensor work so the failure is going to be specific to SAPI/Speech Server tech.

    Based on the docs, there is a specific way to initialize for dictation mode:

    http://msdn.microsoft.com/en-us/library/ee125477(v=vs.85).aspx

    You may want to see if you can find the SAPI Dictation Pad sample and see if you can modify that with that KinecAudioStream wrapper:

    http://msdn.microsoft.com/en-us/library/ms720178(v=vs.85).aspx


    Carmine Sirignano - MSFT

    Wednesday, December 17, 2014 7:43 PM

All replies

  • Kinect support in the Windows speech pipeline(SAPI) is mainly just a source of audio data. The internals of how speech uses Kinect would be no different than using another microphone. The main difference is Kinect audio is very specific(mono 32-bit IEEE floating point PCM stream sampled at 16 kHz, typical PCM values will be between -1 and +1) which may not be directly supported with it or other audio pipelines expecting 44kHz, 16bit, 2 channel audio. The SpeechBasics-D2D provides a KinectAudioStream wrapper that makes the sensor work so the failure is going to be specific to SAPI/Speech Server tech.

    Based on the docs, there is a specific way to initialize for dictation mode:

    http://msdn.microsoft.com/en-us/library/ee125477(v=vs.85).aspx

    You may want to see if you can find the SAPI Dictation Pad sample and see if you can modify that with that KinecAudioStream wrapper:

    http://msdn.microsoft.com/en-us/library/ms720178(v=vs.85).aspx


    Carmine Sirignano - MSFT

    Wednesday, December 17, 2014 7:43 PM
  • That is good news. I have tried to find the Dictation Pad sample, but seems to have disappeared from the current Windows SDK as well as from the Dev Center. So do you know where else to find an up-to-date sample?
    Wednesday, December 17, 2014 8:43 PM
  • I still couldn't find an up-to-date sample for the Dictation Pad, however I now am sure that

    a) Microsoft.Speech only has the Server capabilities, i.e. no dictation. http://msdn.microsoft.com/en-us/library/jj127858.aspx

    b) the recognizer returned by SpeechRecognitionEngine.InstalledRecognizers() under System.Speech is not supported in the Kinect sample. But why? As I understand it, the Recognizers are initialized in the same way: http://msdn.microsoft.com/en-us/library/office/hh361636%28v=office.14%29.aspx

    Hope that this might help someone to help me

    Tuesday, January 6, 2015 4:17 PM
  • Any progress with this? I'm having a simular Problem. I'm trying to get the Kinect v2 running with System.Speech because I Need dictation. But it simply doesn't work, I guess ...
    Wednesday, September 2, 2015 3:47 PM
  • Hi Kurtkroemer,

    I'm able to use the Kinect as an audio input device just fine here with the System.Speech.Recognition.SpeechRecognitionEngine APIs. Did you make sure that the kinect v2 is showing up in recording devices (right-click on the volume indicator in the taskbar -> recording devices -> select the Microphone Array (the sensor) -> Set Default)?

    A small blob of code like this works for me:

            private void init_Click(object sender, RoutedEventArgs e)
            {
                engine = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
                var recognizers = SpeechRecognitionEngine.InstalledRecognizers();
                
                foreach(var recognizer in recognizers)
                {
                    System.Diagnostics.Debug.WriteLine(recognizer.Description);
                }
    
    
                engine.LoadGrammar(new DictationGrammar());
    
                engine.SpeechRecognized += Engine_SpeechRecognized;
    
                engine.SetInputToDefaultAudioDevice();
    
                engine.RecognizeAsync(RecognizeMode.Multiple);
            }
    
            private void Engine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
            {
                output.Text = "Recognized text: " + e.Result.Text;
    
            }

    Wednesday, September 2, 2015 7:52 PM
    Moderator
  • Nice! Works for me as well. Thanks!

     Did you try to use the AudioBeam thing to focus on a specific person? 

    Thursday, September 3, 2015 5:52 PM