none
Native code for video skeleton data and managed code for speech recognition on the same application RRS feed

  • Question

  • Hi,

    I had writed a Kinect application in native c++. I don't want to use SAPI 5.4 because of the accuracy problem. So the only choise is to use Microsoft Speech Platform SDK v11.0 to do speech recognition for me. I still want to use native c++ for main application, the UI,video skeleton data... . And rewrite C# "Speech sample" as a COM component. Kinect for windows sdk just support a device used by only an application at a time. Main application call the COM component for speech recognition is the only way.

    Then a question came out. C# use KinectSensor class and c++ use INuiSensor class to represent a kinect device. How to pass a kinect handle stuff from native c++ to c# that can transform INuiSensor to KinectSensor?


    I had read the simiar posts below but didn't find my answer. For reference.

    ---------------------------------

    Speech Recognition with Kinect in C++

    http://social.msdn.microsoft.com/Forums/en-US/kinectsdkaudioapi/thread/7456b96d-f4d7-4d59-bace-2b8e492c6aae


    C++ Speech Recognition, need some confirmation of what a SAPI researcher told me

    http://social.msdn.microsoft.com/Forums/en-US/kinectsdkaudioapi/thread/4f9f3a50-5347-483c-8b1b-11875f85dc86

    Wednesday, February 29, 2012 8:10 AM

Answers

  • Since you can only make one instance of the KinectSensor the recommendation would be to use all native c++ to capture your audio in addition to everything else. The captured audio buffers can then be shared through interop with C#.

    Are you looking to do real-time processing, or can you just save the wave file, and then process that from a worker c# process?

    • Marked as answer by Kris Yu Tuesday, March 20, 2012 4:59 AM
    Tuesday, March 6, 2012 1:06 AM
  • Hi,

    I had a similar problem and would suggest you the following:

    Switch all kinect related parts of your code to C++/CLI and access the streams and perform speech reco through the .NET framework. For example in Visual Studio you can do that by activating the /clr compile switch for all concerned cpp files. As a starting point you can use the C# code examples shipping with the kinect sdk and translate the C# syntax to C++/CLI. This is pretty straightforward and there is loads of resources on the web covering C++/CLI. Since the "slow" managed code just needs to run at 30FPS and algorithms that do the heavy lifting (image processing and so on..) can still be implemented on top of that in native C/C++ there should be just a little performance penalty. This way I integrated the kinect skeleton + speech reco in Virtools.

    Hope this helps..

    Tuesday, March 6, 2012 9:33 AM

All replies

  • Since you can only make one instance of the KinectSensor the recommendation would be to use all native c++ to capture your audio in addition to everything else. The captured audio buffers can then be shared through interop with C#.

    Are you looking to do real-time processing, or can you just save the wave file, and then process that from a worker c# process?

    • Marked as answer by Kris Yu Tuesday, March 20, 2012 4:59 AM
    Tuesday, March 6, 2012 1:06 AM
  • Hi,

    I had a similar problem and would suggest you the following:

    Switch all kinect related parts of your code to C++/CLI and access the streams and perform speech reco through the .NET framework. For example in Visual Studio you can do that by activating the /clr compile switch for all concerned cpp files. As a starting point you can use the C# code examples shipping with the kinect sdk and translate the C# syntax to C++/CLI. This is pretty straightforward and there is loads of resources on the web covering C++/CLI. Since the "slow" managed code just needs to run at 30FPS and algorithms that do the heavy lifting (image processing and so on..) can still be implemented on top of that in native C/C++ there should be just a little performance penalty. This way I integrated the kinect skeleton + speech reco in Virtools.

    Hope this helps..

    Tuesday, March 6, 2012 9:33 AM
  • Hi Carmine and Deggy,

    Sorry for reply late. I need to do real-time processing and i solve this problem by using SetInputToDefaultAudioDevice() on C# COM component. So i don't need to speed too much time trying how to pass parameter or audio buffer to C# COM.  I think i will try deggy's suggestion write native code and managed code in same project by just changing compiler option when i need to do similar stuff.

    Thanks both.

    By the way. Is SetInputToDefaultAudioDevice() will decrease accuracy ?

    For reference.

    ---------------------------------

    How to call a managed DLL from native Visual C++ code in Visual Studio.NET or in Visual Studio 2005

    http://support.microsoft.com/kb/828736/en-us

    Monday, March 12, 2012 3:28 PM
  • I would just say be careful on how you are sharing data between managed and unmanged code. You don't want the GC to run in the middle of something that the unmanaged code side is processing. If you are using managed c++ /cli, you are running under the .Net runtime. 

    To get any close to real-time operation, you have to use unmanaged c++. Some of the reasons why c# is not supported on the core media api's are discussed here.

    http://blogs.msdn.com/b/mediasdkstuff/archive/2009/04/01/calling-the-format-sdk-directshow-media-foundation-or-the-wasapi-from-managed-code-c-vb-net.aspx

    Tuesday, March 13, 2012 11:58 PM