none
What is the best solution to do Speech To Text in Kinect 1.5 RRS feed

  • Question

  • Hi,

    I'm currently using speech recognition capabilities bundled in Kinect SDK to perform some actions in my app, It's working great (almost, only problems on very short words).

    However, I need to a small amount of "Speech to text", i.e I want to be able to recognize "go to X" where X is a number or a name or "search X" where X is a name, like : Go to Paris, search pizza, etc..

    It's like what is available on Android devices today (I don't expect it to work as good as it is on Android) for making searches.

    Is there a way to do this with the Kinect ? What are the available options ? I heard about using SAPI (with very bad performances ?)

    Tuesday, July 17, 2012 2:24 PM

Answers

All replies

  • i am using the following way in my code and it is very good performance and it is work fine

    initialize the grammar with the initialization of kinect speech

    and load the choices or phrases of the search and append them to the grammar

    after you finish search


    Thanks,
    MOHAMED A. SAKR | Technical Lead | Unified Communications | EgyptNetwork
    Please remember to click “Mark as Answer” on the post that helps you. This can be beneficial to other community members. Also try to Vote as Helpful

    Wednesday, July 18, 2012 10:27 AM
  • Thank you for your answer

    Indeed, I don't know what choices will be ! Take the case where I would like to search a city, I'm not gonna enumerate all city in the country and add them to the grammar.

    I would like to have a text with what has been understand by "Kinect" (and Speech API or whatever) and then do a text research.

    Wednesday, July 18, 2012 1:36 PM
  • If the searched cities are well known English keywords it will be recognized by the Speech API

    Thanks,
    MOHAMED A. SAKR | Technical Lead | Unified Communications | EgyptNetwork
    Please remember to click “Mark as Answer” on the post that helps you. This can be beneficial to other community members. Also try to Vote as Helpful

    Thursday, July 19, 2012 7:36 AM
  • You need to use grammars and semantics. "go to" is one semantic result, "paris" and other cities are another. Together they make a phrase/sentence. 

    However, if you are trying to get dictation, that is not possible. You need to pre-define all possible speech values.

    Tuesday, August 14, 2012 1:55 AM
  • Yes, my grammar is already OK but there is no way I can pre-define all possible values, that's a huge lack of Kinect. A phone with a poor single mic and a slow processor can do better, Microsoft Speech engine is more than late !
    Friday, August 17, 2012 10:00 AM
  • You have a lot to learn about Speech recognition engines :) If you need dictation, use System.Speech instead of Microsoft.Speech. 
    Friday, August 17, 2012 5:50 PM