none
Speech recognition with head phone mic doesn't work when kinect plugged in? RRS feed

  • Question

  • Hi,

    I wrote a simple WPF Speech Recognition application using Microsoft.Speech namespace which uses a normal Headphone mic and is working fine.

    But when I integrate this piece of code into the Kinect application which only renders a video stream onto the application, the speech recognition(using headphone mic) neither recognizes my speech nor throwing any exception.

    Please note that I am calling both the methods video stream & speech processing in Window_loaded event.

    Can anyone please help me if I am missing anything here.

    Thanks,

    Bharat.

    Friday, September 9, 2011 6:05 AM

Answers

  • Sorry it took so long to get back to you Bharat. I can confirm that I'm seeing the same thing you're seeing. Minimal code change that triggers this is

    _nui.VideoFrameReady += NuiColorFrameReady; 

    So it's not something in Nui initialization, but in the event handling. My guess at this point is that Nui event handling is consuming some processor bandwidth that makes your application to lose some number of audio samples from microphone audio stream. I haven't been able to confirm this yet, but we'll definitely investigate this one more. Thanks for feedback!

    In the meantime, you could try recording the raw audio stream you're getting from kinect by using similar code as shown in RecordAudio sample in two scenarios:

    1) With Nui runtime not initialized
    2) with nui runtime initialized and events registered

    Then hear the audio output in both cases. If samples are being dropped, the audio would sound more choppy in the second case.

    Eddy


    I'm here to help
    • Marked as answer by ykbharat Wednesday, February 29, 2012 12:49 PM
    Friday, September 16, 2011 11:34 PM

All replies

  • have you set breakpoints in your application to try to debug what's going on with the speech? E.g.: what breakpoints get hit in your initial sample app that don't get hit when you integrate with Kinect video stream.

    Also, to get more exception information you could add a dispatcher for unhandled exceptions to your wpf app. Add the following property to the "Application" tag in app.xaml:

    DispatcherUnhandledException="Application_DispatcherUnhandledException"

    And then add the following code to app.xaml.cs:

    private void Application_DispatcherUnhandledException(object sender, System.Windows.Threading.DispatcherUnhandledExceptionEventArgs e)
    {
        System.Console.Write(e.Exception);
    }

    That might give you additional information.

    Also, to clarify, you never tried to initialize KinectAudioSource in your application, since you're using a non-kinect audio source, right?

    Eddy


    I'm here to help
    Friday, September 9, 2011 6:26 PM
  • Hi ykbharat,

    try to set your InputAudioDevice to the Default Audio Device.

    sr = new SpeechRecognitionEngine();

    sr.SetInputToDefaultAudioDevice();

     

    In case you´re using XBOX with kinect your DefaultAudioDevice should be the microphone array built-in on kinect hardware.

    Check your audio settings on XBOX to confirm your audio device.

     

    Regards


    Mário Vaz Henriques .NET Dev
    Sunday, September 11, 2011 9:51 PM
  • Eddy,

    • In initial speech sample app, sr_SpeechRecognized event is raised whenever a word is uttered.
    • When Kinect video stream part integrated the above event is not raised.
    • The DispatcherUnhandledException event did not displayed any exception message.
    • KinectAudioSource is not intialized in my app.
    • Default audio device is set to my headset microphone.

    Also please find my code.

     public partial class MainWindow : Window
        {
            private Runtime _nui;

            public MainWindow()
            {
                InitializeComponent();
            }

            private void Window_Loaded(object sender, RoutedEventArgs e)
            {
                KinectInitialization();
                SpeechHandler();
            }

            private void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
            {
                //MessageBox.Show(e.Result.Text);
                //using (StreamWriter w = File.AppendText(@"D:\MyGestures\RecognizedGestures.txt"))
                using (StreamWriter w = File.AppendText(@"D:\Speech.txt"))
                {
                    w.WriteLine(e.Result.Text);
                    w.Flush();
                    w.Close();
                }
            }

            private void KinectInitialization()
            {
                _nui = new Runtime();

                try
                {
                    _nui.Initialize(RuntimeOptions.UseColor);
                }
                catch (InvalidOperationException)
                {
                    System.Windows.MessageBox.Show("Runtime initialization failed. Please make sure Kinect device is plugged in.");
                    return;
                }

                try
                {
                    _nui.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);
                }
                catch (InvalidOperationException)
                {
                    System.Windows.MessageBox.Show(
                        "Failed to open stream. Please make sure to specify a supported image type and resolution.");
                    return;
                }

                _nui.VideoFrameReady += NuiColorFrameReady;
            }

            private void SpeechHandler()
            {
                //SpeechRecognizer sr = new SpeechRecognizer();
                //en-GB (Uk accent) ; en-US (US accent)
                SpeechRecognitionEngine sr = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

                // Create a simple grammar that recognizes “red”, “green”, or “blue”.
                Choices colors = new Choices();
                colors.Add("Top");
                colors.Add("Bottom");
                colors.Add("Move Up");
                colors.Add("Move Down");
                colors.Add("Zoom out");
                colors.Add("Zoom France Network Highlighted");
                colors.Add("Show Network for france");

                GrammarBuilder gb = new GrammarBuilder();
                gb.Append(colors);

                // Create the actual Grammar instance, and then load it into the speech recognizer.
                Grammar g = new Grammar(gb);
                sr.LoadGrammar(g);

                //SrgsDocument doc = new SrgsDocument("Grammar.xml");
                //sr.LoadGrammar(new Grammar(doc));

                // Register a handler for the SpeechRecognized event.
                sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized);
                sr.SetInputToDefaultAudioDevice();
                sr.RecognizeAsync(RecognizeMode.Multiple);
            }

            private void NuiColorFrameReady(object sender,ImageFrameReadyEventArgs e)
            {
                // 32-bit per pixel, RGBA image
                PlanarImage image = e.ImageFrame.Image;
                videoImage.Source = BitmapSource.Create(
                    image.Width, image.Height, 96, 96, PixelFormats.Bgr32, null, image.Bits, image.Width * image.BytesPerPixel);
            }

        }
    }

     

    Thanks,

    Bharat.

     

    Monday, September 12, 2011 7:19 AM
  • Bharat,

    I haven't seen this problem before, and I haven't heard of kinect video interfering with audio processing, so what you should do is narrow down the minimal part of code that is causing problems for you. E.g.: if you comment out the KinectInitialization() method call in Window_Loaded, do things go back to normal for you, or does speech still not work?

    If you step through SpeechHandler() method line by line, does everything get executed the way you expect, with return values being equivalent as when things do work for you?

    Hope this helps,
    Eddy


    I'm here to help
    Monday, September 12, 2011 6:09 PM
  • Eddy,

    If KinectInitialization() method is commented, build and run the app, things are normal and speech works fine.

    Stepping through SpeechHandler() line by line gives the same results as earlier and works fine.

     

    Thanks,

    Bharat.

    Tuesday, September 13, 2011 5:33 AM
  • Sorry it took so long to get back to you Bharat. I can confirm that I'm seeing the same thing you're seeing. Minimal code change that triggers this is

    _nui.VideoFrameReady += NuiColorFrameReady; 

    So it's not something in Nui initialization, but in the event handling. My guess at this point is that Nui event handling is consuming some processor bandwidth that makes your application to lose some number of audio samples from microphone audio stream. I haven't been able to confirm this yet, but we'll definitely investigate this one more. Thanks for feedback!

    In the meantime, you could try recording the raw audio stream you're getting from kinect by using similar code as shown in RecordAudio sample in two scenarios:

    1) With Nui runtime not initialized
    2) with nui runtime initialized and events registered

    Then hear the audio output in both cases. If samples are being dropped, the audio would sound more choppy in the second case.

    Eddy


    I'm here to help
    • Marked as answer by ykbharat Wednesday, February 29, 2012 12:49 PM
    Friday, September 16, 2011 11:34 PM