Answered by:
implementing barge-in in a speech reco application

Question
-
We have a speech reco application (using the Microsoft.Speech SDK included with UCMA SDK install). Our first shot at implementing barge in was to start the recognizer and start the synthesizer at the same time and when we get a speech detected event from the recognizer we stop the synthesizer. That all works fine. The issue is the intial silence timeout is set to 3 seconds so after three seconds from when the prompt start playing, we get a end recognition event (the initialSilenceTimeout fired). The prompt is still playing (say it it 10 seconds long), so we cancel it. The caller never had a chance to say anything in this scenario. When we wrote this application on OCS Speech Server, the API was 'smart enough' to somehow tie the synthesizer and recognizer together so that even if you started the synthesizer and recognizer at the same time, the recognizer's initialSilenceTimeout timer would not start running until the prompt stopped playing. That's exactly the functionality we are trying to duplicate here. We need to start both at the same time, but we need the recognizer's initial silence timer to start only after the prompt has finished playing
The solutions that come to mind are to initially set the recognizer's initialTimeout to a very long time and then if we ever get the synthezier completed event, we try to set the timeout to the normal 3 second value we want. This does not appear to work. Once the recognizer is running, I am pretty sure you can set the intialSilenceTimeout property.
Next possible solution is to start the recognizer and synthezier like we do now, and we would have to have the initialSilenceTimeout value set to a high value. Then if the prompt ever ends, and we haven't had any speech yet, we stop the recognizer and restart it using the 3 sencd tiomeout we want. This solution worries me because there might be a window between restarting the recognizer where we miss some speech and that might affect recognition.
So the first solution probably just doesn't work, and the second is pretty ugly. It seems like there must be a way to do this? I really think there is a better way to do this, but searching through the class library reference, I just can't see how. Anyone have any ideas or experience trying to do this?
When I search for barge-in in the Microsoft Speech SDK help doc, it talks about something that might apply but it is using the Microsoft Speech Platform API which is a C++ CPI that we aren't using. It seems like
Sorry if this not so much a ucma question as a Microsoft.speech question. I couldn't find a forum for Microsoft.speech (just for OCS speech server). If you know of another forum that might be more applicable, please let me know.
Thank you
mike castilloFriday, March 11, 2011 6:10 PM
Answers
-
Have you taken a look at the Barge-In Sample application that comes with the UCMA Workflow Sample Applications?
This posting is provided "AS IS" with no warranties, and confers no rights.- Proposed as answer by Kai Strandskov [Msft] Tuesday, March 22, 2011 10:23 PM
- Marked as answer by Kai Strandskov [Msft] Wednesday, March 23, 2011 8:35 PM
Monday, March 21, 2011 10:07 AM -
I will mark this as the answer for now. If you have any additional questions, don't hesitate to ask.
This posting is provided "AS IS" with no warranties, and confers no rights.- Marked as answer by mjcasti Saturday, March 26, 2011 11:56 PM
Wednesday, March 23, 2011 8:34 PM
All replies
-
You can send your Microsoft.Speech API questions to e-mail listen@microsoft.com
-Ramesh
Ramesh Anantharaju- Marked as answer by Kai Strandskov [Msft] Wednesday, March 23, 2011 8:34 PM
- Unmarked as answer by Kai Strandskov [Msft] Wednesday, March 23, 2011 8:35 PM
Friday, March 18, 2011 3:08 PM -
Have you taken a look at the Barge-In Sample application that comes with the UCMA Workflow Sample Applications?
This posting is provided "AS IS" with no warranties, and confers no rights.- Proposed as answer by Kai Strandskov [Msft] Tuesday, March 22, 2011 10:23 PM
- Marked as answer by Kai Strandskov [Msft] Wednesday, March 23, 2011 8:35 PM
Monday, March 21, 2011 10:07 AM -
I will mark this as the answer for now. If you have any additional questions, don't hesitate to ask.
This posting is provided "AS IS" with no warranties, and confers no rights.- Marked as answer by mjcasti Saturday, March 26, 2011 11:56 PM
Wednesday, March 23, 2011 8:34 PM -
Yes, Sorry, I was out for a while.
The barge in sample, as you indicated, is workflow, and I am not using workflow. In that sample the workflow runtime takes care of barge in for you. I am trying to implement it myself for a Microsoft.Speech API based application. And it's not really how to do barge in. I got that. It's really getting the initialSilenceTimer of the recognizer to work. Let's mark this answered as you have for this forum. I will use the speech email.
Thanks very much of the email address for speech questions!
mike castilloSunday, March 27, 2011 12:02 AM