none
Using System.Speech with Kinect

    Question

  • I am developing a prototype speech to text captioning application for a University project. I am going to be using gesture recognition within my project late on, so I thought it would be a good idea to use the Kinect as the microphone source, rather than using an additional microphone. The idea of my application is to recognize spontaneous speeches such as long and complex sentences (I understand it won’t that the speech dictation will not be perfect however). I have seen many Kinect speech samples where it makes a reference to Microsoft.Speech, but not System.Speech. As I need to train the speech engine and load a DictationGrammar into the Speech Recognition Engine, Microsoft.Speech is the only option for me.

    I have managed to get it working while using the Kinect as the direct microphone audio source, but since I am loading the Kinect for the video preview and gesture recognition, I am unable to access it as a direct microphone.

    This is code accessing the microphone directly without loading the Kinect hardware for gesture, etc, and works perfectly:

            private void InitializeSpeech()
            {
                var speechRecognitionEngine = new SpeechRecognitionEngine();
    
                speechRecognitionEngine.SetInputToDefaultAudioDevice();
    
                speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    
                speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    
                speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
            }
    
    

    And this is where I need to access the access source via the Kinect once it has been loaded, which isn't doing anything at all. This I want to be doing:

                using (var audioSource = new KinectAudioSource())
                {
                    audioSource.FeatureMode = true;
    
                    audioSource.AutomaticGainControl = false;
    
                    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;
    
                    var recognizerInfo = GetKinectRecognizer();
    
                    var speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);
    
                    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    
                    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
    
                    using (var s = audioSource.Start())
                    {
                        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
    
                        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
                    }
                }
    
    

    So the question is, is it even possible to use System.Speech instead of Microsoft.Speech with the current Kinect SDK, and what am I doing wrong in the 2nd code sample?


    Dan
    Saturday, December 03, 2011 6:01 PM

Answers

  • I finally got dictation working, using the Kinect as an audio source.  It's a tad difficult because you have to make sure you specify the correct namespaces.  Maybe some of this code should be adapted (fixed) and added to the samples.  Hope this helps someone else struggling with how to integrate System.Speech dictation with the Kinect mics.

            const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
            
            public static void Main(string[] args)
            {
                // Obtain a KinectSensor if any are available
                KinectSensor sensor = (from sensorToCheck in KinectSensor.KinectSensors where sensorToCheck.Status == KinectStatus.Connected select sensorToCheck).FirstOrDefault();
                if (sensor == null)
                {
                    Console.WriteLine(
                            "No Kinect sensors are attached to this computer or none of the ones that are\n" +
                            "attached are \"Connected\".\n" +
                            "Attach the KinectSensor and restart this application.\n" +
                            "If that doesn't work run SkeletonViewer-WPF to better understand the Status of\n" +
                            "the Kinect sensors.\n\n" +
                            "Press any key to continue.\n");
    
                    // Give a chance for user to see console output before it is dismissed
                    Console.ReadKey(true);
                    return;
                }
    
                sensor.Start();
                
                // Obtain the KinectAudioSource to do audio capture
                KinectAudioSource source = sensor.AudioSource;
                
                source.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample
                source.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition
                System.Speech.Recognition.RecognizerInfo ri = System.Speech.Recognition.SpeechRecognitionEngine.InstalledRecognizers().FirstOrDefault();
                using(var recoEngine = new System.Speech.Recognition.SpeechRecognitionEngine(ri.Id))
                {
                                    
                    // Create the question dictation grammar.
                    System.Speech.Recognition.DictationGrammar customDictationGrammar = new System.Speech.Recognition.DictationGrammar();
                    customDictationGrammar.Name = "Dictation";
                    customDictationGrammar.Enabled = true;
    
                    // Create a SpeechRecognitionEngine object and add the grammars to it.
                    recoEngine.LoadGrammar(customDictationGrammar);
    
                    recoEngine.SpeechRecognized += (s, sargs) => Console.Write(sargs.Result.Text);
    
                    using (Stream s = source.Start())
                    {
    
                        recoEngine.SetInputToAudioStream(s, new System.Speech.AudioFormat.SpeechAudioFormatInfo(System.Speech.AudioFormat.EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
    
                        Console.WriteLine("Dictating. Press ENTER to stop");
    
                        recoEngine.RecognizeAsync(System.Speech.Recognition.RecognizeMode.Multiple);
    
                        Console.ReadLine();
                        Console.WriteLine("Stopping recognizer ...");
                        recoEngine.RecognizeAsyncStop();
    
                    }
                }
    
                sensor.Stop();
            }
    
          

    Monday, March 05, 2012 7:13 PM

All replies

  • Any answers regarding this?

    Thanks


    Dan
    Tuesday, December 13, 2011 11:27 PM
  • I'm working on it. My prompt crash with an unhandled exception (System.Reflection.TargetInvocationException) 10 s after.

     

    What have you got in "recognizerInfo.Id" ?

     

    I succed to use System.Speech and Dictation with Kinect, but with SetInputToDefaultAudioDevice too.


    • Edited by Mellange Thursday, December 22, 2011 5:32 PM
    Thursday, December 22, 2011 5:09 PM
  • Did you ever find a solution to this problem?

    I'm stuck in the same spot.  Guessing that it's because the Kinect audio source has the extra data such as beam angle, that the System.Speech api can't handle.  Wondering if there's a way to strip it down before passing to the System.Speech as the source?

    Friday, March 02, 2012 12:05 AM
  • I finally got dictation working, using the Kinect as an audio source.  It's a tad difficult because you have to make sure you specify the correct namespaces.  Maybe some of this code should be adapted (fixed) and added to the samples.  Hope this helps someone else struggling with how to integrate System.Speech dictation with the Kinect mics.

            const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
            
            public static void Main(string[] args)
            {
                // Obtain a KinectSensor if any are available
                KinectSensor sensor = (from sensorToCheck in KinectSensor.KinectSensors where sensorToCheck.Status == KinectStatus.Connected select sensorToCheck).FirstOrDefault();
                if (sensor == null)
                {
                    Console.WriteLine(
                            "No Kinect sensors are attached to this computer or none of the ones that are\n" +
                            "attached are \"Connected\".\n" +
                            "Attach the KinectSensor and restart this application.\n" +
                            "If that doesn't work run SkeletonViewer-WPF to better understand the Status of\n" +
                            "the Kinect sensors.\n\n" +
                            "Press any key to continue.\n");
    
                    // Give a chance for user to see console output before it is dismissed
                    Console.ReadKey(true);
                    return;
                }
    
                sensor.Start();
                
                // Obtain the KinectAudioSource to do audio capture
                KinectAudioSource source = sensor.AudioSource;
                
                source.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample
                source.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition
                System.Speech.Recognition.RecognizerInfo ri = System.Speech.Recognition.SpeechRecognitionEngine.InstalledRecognizers().FirstOrDefault();
                using(var recoEngine = new System.Speech.Recognition.SpeechRecognitionEngine(ri.Id))
                {
                                    
                    // Create the question dictation grammar.
                    System.Speech.Recognition.DictationGrammar customDictationGrammar = new System.Speech.Recognition.DictationGrammar();
                    customDictationGrammar.Name = "Dictation";
                    customDictationGrammar.Enabled = true;
    
                    // Create a SpeechRecognitionEngine object and add the grammars to it.
                    recoEngine.LoadGrammar(customDictationGrammar);
    
                    recoEngine.SpeechRecognized += (s, sargs) => Console.Write(sargs.Result.Text);
    
                    using (Stream s = source.Start())
                    {
    
                        recoEngine.SetInputToAudioStream(s, new System.Speech.AudioFormat.SpeechAudioFormatInfo(System.Speech.AudioFormat.EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
    
                        Console.WriteLine("Dictating. Press ENTER to stop");
    
                        recoEngine.RecognizeAsync(System.Speech.Recognition.RecognizeMode.Multiple);
    
                        Console.ReadLine();
                        Console.WriteLine("Stopping recognizer ...");
                        recoEngine.RecognizeAsyncStop();
    
                    }
                }
    
                sensor.Stop();
            }
    
          

    Monday, March 05, 2012 7:13 PM
  • Hmm.  The proposed answer doesn't work for me.  When I look in the list of installed recognizers (from System.Speech) I don't see the Kinect one.  I see a couple of others.  I only see the recognizer for the Kinect in Microsoft.Speech.SpeechRecognitionEngine.InstalledRecognizers().  But if I use that one, then I can't use it with the System.Speech classes (which is needed to support, for example, dictation).

    Are you seeing the Kinect recognizer in System.Speech.SpeechRecognitionEngine.InstalledRecognizers()?

    Sunday, May 27, 2012 5:11 AM