none
Passing in a user created stream for the Speech Recognition Engine's SetInputToAudioStream() - how to prevent end of stream being reached? RRS feed

  • Question

  • EDIT: Made the question more clear

    Hello,

    I am using the stream from kinectAudioSource.Start() in a thread to send audio data over the network and I am also trying to use speech recognition on that stream. Based on my tests, if I give the SpeechRecognitionEngine the same stream that I am reading from to send audio data over the network, my outgoing audio stream has gaps and this results in poor audio data being sent. When I disabled the speech recognition part of my program, I noticed that my audio packets I was reading from kinectAudioSource.Start() were perfect with no gaps, which makes me believe that the audio reading thread should be completely uninterrupted and that the speech recognition engine was blocking it somehow as though the engine was also trying to read from the same stream.

    As a result, I am trying to get around the issue by creating my own stream object (a MemoryStream to be precise) and copy data into it in my audioreading thread as follows:

    audioStream = runtime.AudioSource.Start();
    audioStreamMainCopy = new MemoryStream();
    
    public void AudioReadThread()
    {
    ...
    audioSamplesRead = audioStream.Read(audioPacket, audioBufferOffset, AudioSampleSize);
    audioStreamMainCopy.Write(audioPacket, audioBufferOffset, AudioSampleSize);
    ...
    }
    
    public Stream GetAudioStreamCopy()
            {
                return audioStreamMainCopy;
            }

    Then, I pass this memory stream to my Speech recognizer in a different part of my program:

    audioStream = audioStreamer.GetAudioStreamCopy(); //we are grabbing the audio stream from a valid audioStreamer (Kinect)
    audioStream.position = 0;            
    sre.SetInputToAudioStream(audioStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
                
    sre.RecognizeAsync(RecognizeMode.Multiple);
    
    public void DebugOutput()
    {
                
    Console.WriteLine(audioStream.Length + " " + audioStream.Position);
                
    Console.Write(sre.AudioPosition + " " + sre.AudioState + " " + sre.RecognizerAudioPosition + "\n");
    }

    From the DebugOutput() method above, my results display something like this:

    129600 129600
    00:00:00 Stopped 00:00:00
    
    131200 131200
    00:00:00 Stopped 00:00:00
    
    283200 283200
    00:00:00 Stopped 00:00:00

    As you can see, the stream copy that I pass in to the SRE is most certainly getting written to, its size and position are increasing as expected.

    However, since the position is at the end of the stream, the SRE interprets this as if the stream has reached the end, and so it returns a RecognizeCompletedEvent with the InputStreamEnded property as true.

    But since I am calling MemoryStream.Write() to copy data to my memorystream, I believe it keeps moving position to the end of the stream? To remedy this, I edited the code to my read thread

    public void AudioReadThread()

    {

    ...

    audioSamplesRead = audioStream.Read(audioPacket, audioBufferOffset, AudioSampleSize); audioStreamMainCopy.Position = audioStreamMainCopy.Length; audioStreamMainCopy.Write(audioPacket, audioBufferOffset, AudioSampleSize); audioStreamMainCopy.Position = 0;

    ...

    }

    With this, the SRE can now actually recognize a word if I say it quickly and early on during my program's startup (when the audio reading thread begins running) but then inevitably at some point the audioStreamMainCopy's position becomes the end position of the stream (even though I try to reset it to 0 in the Read thread as shown above) and so the SRE once again quits recognition saying end of stream is reached. This "inevitability" I am guessing comes from the SpeechRecognitionEngine calling the .Read() method from the audioStreamMainCopy, and thus is advancing the position of the stream in a way that I can neither see nor control. Essentially, I want the recognizer to never return InputStreamEnded.

    Does anyone have any recommendations to deal with this?






    • Edited by Ohtrahddis Wednesday, June 27, 2012 6:21 PM
    Tuesday, June 26, 2012 10:01 PM

Answers

  • Hello I came up against the same issue. I got around it by creating a new type of stream, and then writing into that. The problem with normal streams is that if they return a fixed length or less bytes than requested the speech recogniser assumes that the stream has finished. This sets up a circular buffer that never finishes allowing realtime processing of speech.

    class SpeechStreamer : Stream
    {
        private AutoResetEvent _writeEvent;
        private List<byte> _buffer;
        private int _buffersize;
        private int _readposition;
        private int _writeposition;
        private bool _reset;
    
        public SpeechStreamer(int bufferSize)
        {
            _writeEvent = new AutoResetEvent(false);
             _buffersize = bufferSize;
             _buffer = new List<byte>(_buffersize);
             for (int i = 0; i < _buffersize;i++ )
                 _buffer.Add(new byte());
            _readposition = 0;
            _writeposition = 0;
        }
    
        public override bool CanRead
        {
            get { return true; }
        }
    
        public override bool CanSeek
        {
            get { return false; }
        }
    
        public override bool CanWrite
        {
            get { return true; }
        }
    
        public override long Length
        {
            get { return -1L; }
        }
    
        public override long Position
        {
            get { return 0L; }
            set {  }
        }
    
        public override long Seek(long offset, SeekOrigin origin)
        {
            return 0L;
        }
    
        public override void SetLength(long value)
        {
    
        }
    
        public override int Read(byte[] buffer, int offset, int count)
        {
            int i = 0;
            while (i<count && _writeEvent!=null)
            {
                if (!_reset && _readposition >= _writeposition)
                {
                    _writeEvent.WaitOne(100, true);
                    continue;
                }
                buffer[i] = _buffer[_readposition+offset];
                _readposition++;
                if (_readposition == _buffersize)
                {
                    _readposition = 0;
                    _reset = false;
                }
                i++;
            }
    
            return count;
        }
    
        public override void Write(byte[] buffer, int offset, int count)
        {
            for (int i = offset; i < offset+count; i++)
            {
                _buffer[_writeposition] = buffer[i];
                _writeposition++;
                if (_writeposition == _buffersize)
                {
                    _writeposition = 0;
                    _reset = true;
                }
            }
            _writeEvent.Set();
    
        }
    
        public override void Close()
        {
            _writeEvent.Close();
            _writeEvent = null;
            base.Close();
        }
    
        public override void Flush()
        {
    
        }
    }



    • Proposed as answer by sporn Wednesday, August 15, 2012 5:09 AM
    • Marked as answer by Chris Wojahn, Security IR Monday, October 29, 2012 5:45 PM
    • Edited by sporn Tuesday, October 30, 2012 12:30 AM
    Wednesday, August 15, 2012 5:08 AM