none
Kinect Array Microphone and Windows Speech Recognition Optimisation RRS feed

  • Question

  • Hi Folks,

    Thanks for providing us with a great SDK!

    I have not had time yet to dig into the SDK in depth or create any programs with it but I hope to be able to do so in the near future.

    I would like to provide some feedback concerning my initial experience with the Kinect Array Microphone.

    I am experiencing some problems relating to Windows Speech Recognition, default audio devices, and the audio levels found within the microphone properties dialog.

    In order for Windows Speech Recognition to work accurately on a PC fine control is required for the audio levels found within the microphone properties dialog. If the levels are set too high then recognition accuracy deteriorates rapidly on a PC and speech recognition becomes unusable. It seems to me that the Kinect drivers are setting/adjusting the Audio levels themselves. As soon as they set the level to high Speech Recognition becomes unusable on my system, if they are set correctly the accuracy is pretty good, but the experience so far has not been consistent at all.

    It seems like the Kinect drivers may currently be optimised for the gaming environment where there are likely to be much higher levels of ambient noise, and for this type of environment it makes sense for the device to dynamically adjust the audio levels. For a PC based productivity application using Kinect as an input device these requirements are likely to be a little different. The environment noise levels are more likely to be able to be able to be controlled, i.e a single user in a quiet room located a specific distance from the array Microphone and with a consistent speaking volume.

    A further potential bug that I have observed is that the Kinect Array Microphone hijacks my default audio device at system startup and sets the Kinect Array Microphone as the default audio input device. I assume that this is a bug and that it should respect the system specified default audio settings and not set the Kinect as the default audio device on rebooting. I need to disconnect the Kinect before I can set a new default audio device, not ideal.

    I have been capturing screenshots of my initial installation experiences and the issues I have described about the audio levels and the hijacking of the default audio dvice by the Kinect at system startup. I will be happy to share them if you may find them useful.

    It is my understanding that at the moment Windows Speech Recognition will only work with the default microphone. Is it likely that in future Windows Speech Recognition will be able to support a user specified/programmatically set microphone? I can easily imagine scenarios where this would be very useful. It is currently possible to set the default communications device and the default microphone, perhaps it should be possible to set the default voice recognition microphone, or even allow for multiple voice recognition microphones as input streams?

    What about a multiple user voice recognition scenario, can Windows support multiple users voice profiles simultaneously? The Kinect beam microphone capability could identify the speaker and use their personal voice model to optimise the recognition. This would require dynamically switching the users voice profile based on who was talking. Presumably the voice profile could come from the users roaming profile data located on a network share. Are you able to able to comment on such matters?

     

    Just my initial thoughts!

     

    I wish you good luck for the rest of the beta phase and official launch.

     

    Austin

    Tuesday, June 21, 2011 3:54 PM

Answers

  • it is currently possible to set a microphone other than the default microphone as the input for WSR.

     

    go to the speech control panel.  click "Advanced speech options".  on the microphone panel at the bottom, click "Advanced"

    It took me a very long time to discover this because I kept looking under "audio input" (I wonder why) and it would take me to the default input settings dialogue, not the one for speech input.

    • Marked as answer by Austin Dimmer Thursday, June 23, 2011 9:54 AM
    Wednesday, June 22, 2011 4:08 PM

All replies

  • it is currently possible to set a microphone other than the default microphone as the input for WSR.

     

    go to the speech control panel.  click "Advanced speech options".  on the microphone panel at the bottom, click "Advanced"

    It took me a very long time to discover this because I kept looking under "audio input" (I wonder why) and it would take me to the default input settings dialogue, not the one for speech input.

    • Marked as answer by Austin Dimmer Thursday, June 23, 2011 9:54 AM
    Wednesday, June 22, 2011 4:08 PM
  • Hi jitterjames,

    Thanks for pointing that out, that's a useful tip I had forgotten about. It's not obvious or easy to find! I imagine there must be ways to programatically set these settings also. I'd be interested to hear how that could be done.

    This still leaves me with the problem of the Kinect Hijacking the default audio device at system startup and then manipulating the audio levels. In my opinion this indicates a buggy driver.

    Has anyone else had a similar experience?

    Can anyone from MSR comment about the drivers and how they should be behaving when configured correctly?

    Thanks

    Austin
    Thursday, June 23, 2011 10:01 AM
  • but where is speech control panel???D;pls reply pls i beg u pls

     

     

     

     

     


    Sunday, June 26, 2011 3:54 PM
  • Haozspirit123,

     

    If you click on the Speech Recognition icon in the task bar then select Configuration, then Open the Speech Recognition Control Panel should take you there.

     

    I hope that helps.

     

    If anyone from MSR is listening I'm still interested to hear your opinions about the issues I originally raised.

     

    Specifically:


    Can anyone from MSR comment about the drivers and how they should be behaving when configured correctly?

     

    Thanks

     

    Austin

    Monday, June 27, 2011 12:09 PM
  • Austin,

    Sorry it took so long to reply. We are in fact listening, but I was trying to follow up internally to find good answers for your questions. About the question on the Kinect Array Microphone "hijacking" your default audio input device settings, this is as was intended at release time. It seemed convenient to have the new device be set up automatically as the default for sound capture and speech recognition to make things more convenient for beginners. This could be changed in the future if it's proving to be more inconvenient than convenient on average, and I'm confident your vote would be to change it, so I'll record request to be evaluated for the future releases.

    As far as having a programmatic way to change the microphone source setting, to set a different microphone for sound capture from kinect, you can use the KinectAudioSource.MicrophoneIndex property, but I need to follow up a little more to see if there is anything else that needs to be done to have this work in your speech scenario.

    Feel free to ping me again if I take too long to reply back. I periodically look through all unread threads in these forums and try to answer/follow up on unresolved issues.

    Hope this helps,
    Eddy


    I'm here to help
    Tuesday, June 28, 2011 9:16 PM
  • Eddy,

     

    Thank you very much for your response and your diligence to find the correct answer.

    Perhaps it is the correct decision to make it more convenient for beginners by setting Kinect to be the default audio device at system start-up. This is not the thing that I have, from a user experience perspective, a problem with. The core problem in my opinion is that it needs to be very easy for the user to set or change the microphone used for Windows Speech Recognition. Currently if I change the microphone in use for speech recognition the Windows speech recognition system needs to be restarted in order for the microphone settings to take effect. There also seems to be some confusion about how to set Windows speech recognition to use a non-default microphone. I truly believe there must be a simpler and more effective way to optimise the overall user experience with respect to these types of issues.

    If you have time to find out the answers to the other two issues which I raised in my original post, (concerning audio levels and multiple user profiles) I would be very much obliged. Perhaps the issues I have raised are too forward thinking and Microsoft, for competitive reasons or otherwise, is not in a position to be able to comment?

     

    Thanks again.

     

    Austin

     

    Wednesday, June 29, 2011 5:43 PM
  • Austin,

    As far as audio levels, I have followed up to see if there is anything we can do better in the future. As far as multiple user profiles, I can't comment. Hopefully you will be able to find more information under documentation for speech platform: http://msdn.microsoft.com/en-us/speech/default.aspx.

    Good luck!
    Eddy


    I'm here to help
    Thursday, June 30, 2011 4:08 PM
  • Hi Austin,

    The issues you are referring to with default microphone selection are in fact part of Windows 7 usability enhancements. Look at http://msdn.microsoft.com/en-us/windows/hardware/gg463052 for more complete documentation on this.

    As for the selection of the speech recognition input, you have the option of allowing the system to follow the default console device as it switches automatically (default), or you can choose to select a specific input device that never switches on you.  It seems like all the bases are covered here, unless there truly is a bug that the speech recognition doesn't reset itself when the default console capture device changes.  I think the other part of the equation is that the input device selection for speech recognition is so well hidden, so that only someone that really knows what they are doing will even know that this can be changed independently of the default console device and the default communications device.

    David

    Saturday, July 2, 2011 9:52 PM
  • David,

     

    Thank you very much for your excellent link. I will review that material when I have time.

     

    In my experience the speech recognition system does not reset itself when the default audio device changes. If the speech recognition system did manage to detect that the default audio device had changed and adjusted itself accordingly I do not think I would have had such a poor user experience thus far, and I would not have been motivated to raise the issues here. From some of the responses to this thread it also appears that other users are finding these features difficult to find and use.

     

    To answer my own questions about the default audio levels, it does seem that there is a mechanism to programmatically control these settings, Skype certainly seems to do so by automatically adjusting the input volume based on the audio signal levels that it is detecting from the speaker. However, I still feel that the connect audio drivers are controlling these levels themselves and if so, there would perhaps be a conflict between my program setting the audio levels to what my program feels as appropriate, and the Kinect audio drivers setting the audio levels to what the Kinect feels appropriate. this would be likely to impact performance to a small degree but I have not yet written the code or run any tests to verify this.

     

    I personally feel that speech recognition is one of the most powerful ways of interacting with the machines, and I believe that in order for the power of this interaction mechanism to be fully realised the usability of the system must be second to none, based on my current experience I have to conclude that the usability, despite the enhancements is not yet up to par. Windows speech recognition is truly innovative and combined with excellent hardware such as the Kinect offers the Windows platform a unique opportunity to differentiate itself from the competition. The competition do not yet have the basics for effective speech recognition sorted out either, but it is only a case of time. I truly hope that Microsoft can rise up to the challenge and become the leader in this field.

    Cheers

     

    Austin

    Monday, July 4, 2011 10:06 PM