The following forum(s) have migrated to Microsoft Q&A (Preview): Developing Universal Windows apps!
Visit Microsoft Q&A (Preview) to post new questions.

Learn More

 locked
SpeechSynthesizer/MediaPlayer memory leak RRS feed

  • Question

  • We have a Uwp app that uses Microsoft voices to speak and read the text as it speaks. I noticed that the app's memory usage increases with each bit of text that is spoken, and it will eventually run out of memory. It does not matter which voice is used or what text is spoken.

    In order to highlight the text, I subscribe to events in the TimedMedatataTracks of the MediaPlaybackItem. When the text is finished speaking, I unsubscribe each event and dispose the MediaPlaybackItem.Source. The Visual Studio memory profiler does not show any leaks in managed memory, so I suspect something is not getting cleaned up in the unmanaged space.

    Edit: I commented on this in the code but I'll call it out here -- if I do not subscribe to the TimedMetadataTrack events, the leak goes away. I am also able to reproduce this using the Windows Sample App (Synthesize Text with Boundaries)


    Am I missing something that needs to be disposed, or is this a bug in SpeechSynthesizer/MediaPlayer?

    I don't see a way to attach a .zip of my sample app so I'll paste the code here:

    using System;
    using System.Diagnostics;
    using Windows.Media.Core;
    using Windows.Media.Playback;
    using Windows.Media.SpeechSynthesis;
    
    namespace WindowsTts
    {
        public class UwpNativeVoice : IDisposable
        {
            private readonly object _activeSpeechLock;
            private SpeechSynthesizer _synthesizer;
            private MediaPlayer _mediaPlayer;
            private SpeechCallback _activeSpeech;
    
            public UwpNativeVoice(VoiceInformation platformInfo)
            {
                _activeSpeechLock = new object();
    
                _synthesizer = new SpeechSynthesizer();
                _synthesizer.Options.IncludeWordBoundaryMetadata = true;
                _synthesizer.Voice = platformInfo;
    
                _mediaPlayer = new MediaPlayer
                {
                    RealTimePlayback = true,
                    AutoPlay = false,
                    Volume = 1.0f
                };
                _mediaPlayer.MediaOpened += OnMediaPlayerMediaOpened;
                _mediaPlayer.MediaEnded += OnMediaPlayerMediaEnded;
            }
    
            public void Dispose()
            {
                _mediaPlayer.MediaOpened -= OnMediaPlayerMediaOpened;
                _mediaPlayer.MediaEnded -= OnMediaPlayerMediaEnded;
                (_mediaPlayer.Source as MediaPlaybackItem)?.Source?.Dispose();
                _mediaPlayer.Source = null;
                _mediaPlayer.Dispose();
                _mediaPlayer = null;
    
                _synthesizer?.Dispose();
                _synthesizer = null;
            }
    
            public async void Speak(string text, SpeechDelegate speechDelegate)
            {
                if ( string.IsNullOrEmpty(text) )
                {
                    // no-op; just fire events and bail
                    speechDelegate?.Invoke(text, ReadTextEvent.Start);
                    speechDelegate?.Invoke(text, ReadTextEvent.End);
                    return;
                }
    
                if (_activeSpeech != null)
                {
                    // something currently speaking; halt it, fire events and then start anew
                    Halt();
                }
    
                // get synth stream, and add markers for bookmarks & word boundaries
                var synthStream = await _synthesizer.SynthesizeTextToStreamAsync(text);
    
                lock (_activeSpeechLock)
                {
                    _activeSpeech = new SpeechCallback(text, speechDelegate);
    
                    try
                    {
                        var source = MediaSource.CreateFromStream(synthStream, synthStream.ContentType);
                        var playbackItem = new MediaPlaybackItem(source);
                        ConfigPlaybackEvents(playbackItem); //Comment this out and the leak goes away
                        _mediaPlayer.Source = playbackItem;
                        _mediaPlayer.Play();
                    }
                    catch (Exception e)
                    {
                        Debug.WriteLine(e);
                        _activeSpeech?.Invoke(ReadTextEvent.End);
                        _activeSpeech = null;
                    }
                }
            }
    
            public bool Halt()
            {
                lock (_activeSpeechLock)
                {
                    if (_activeSpeech == null)
                        return true;
                }
    
                _mediaPlayer.Pause();
                DestroyMediaPlaybackItem(_mediaPlayer.Source as MediaPlaybackItem);
                _mediaPlayer.Source = null;
    
                SpeechCallback callback;
                lock (_activeSpeechLock)
                {
                    callback = _activeSpeech;
                    _activeSpeech = null;
                }
                callback?.Invoke(ReadTextEvent.End);
    
                return true;
            }
    
            private void OnMediaPlayerMediaOpened(MediaPlayer sender, object args)
            {
                FireReadTextEvent(ReadTextEvent.Start);
            }
    
            private void OnTimedMetadataTrackEntered(TimedMetadataTrack track, MediaCueEventArgs args)
            {
                if ( track.TimedMetadataKind == TimedMetadataKind.Speech && args.Cue is SpeechCue speechCue )
                {
                    var startIdx = speechCue.StartPositionInInput ?? 0;
                    var endIdx = speechCue.EndPositionInInput ?? -1;
                    FireReadTextEvent(ReadTextEvent.WordEvent(startIdx, (endIdx - startIdx) + 1));
                }
            }
    
            private void OnMediaPlayerMediaEnded(MediaPlayer sender, object args)
            {
                SpeechCallback callback;
                lock ( _activeSpeechLock )
                {
                    callback = _activeSpeech;
                    _activeSpeech = null;
                }
                callback?.Invoke(ReadTextEvent.End);
    
                DestroyMediaPlaybackItem(sender.Source as MediaPlaybackItem);
                sender.Source = null;
            }
    
            private void FireReadTextEvent(ReadTextEvent evt)
            {
                SpeechCallback callback;
                lock ( _activeSpeechLock )
                    callback = _activeSpeech;
                callback?.Invoke(evt);
            }
    
            private void ConfigPlaybackEvents(MediaPlaybackItem playbackItem)
            {
                // see: https://docs.microsoft.com/en-us/uwp/api/windows.media.core.timedmetadatatrack
    
                // iterate through existing tracks, registering callbacks for them
                for ( int i = 0; i < playbackItem.TimedMetadataTracks.Count; i++ )
                    RegisterAction(playbackItem, i);
            }
    
            private void RegisterAction(MediaPlaybackItem item, int idx)
            {
                const string speechWordIdentifier = "SpeechWord";
    
                TimedMetadataTrack track = item.TimedMetadataTracks[idx];
                if (track.Id.Equals(speechWordIdentifier, StringComparison.Ordinal) || track.Label.Equals(speechWordIdentifier, StringComparison.Ordinal))
                {
                    track.CueEntered += OnTimedMetadataTrackEntered;
                    item.TimedMetadataTracks.SetPresentationMode((uint)idx, TimedMetadataTrackPresentationMode.ApplicationPresented);
                }
            }
    
            private void DestroyMediaPlaybackItem(MediaPlaybackItem item)
            {
                if ( item == null )
                    return;
    
                foreach ( var track in item.TimedMetadataTracks )
                {
                    track.CueEntered -= OnTimedMetadataTrackEntered;
                }
    
                item.Source?.Dispose();
            }
        }
    }
    namespace WindowsTts
    {
        /// <summary>Defines a trigger that caused the broadcasting of a ReadTextEvent.</summary>
        public enum ReadTextTrigger
        {
            Start,
            Bookmark,
            Word,
            End,
        }
    
        /// <summary>A ReadTextEvent encompasses the relevant information from the tts world and is passed to the api user as part of a ReadTextInfo's EventAction data. </summary>
        public class ReadTextEvent
        {
            public static ReadTextEvent Start { get; } = new ReadTextEvent()
            {
                Trigger = ReadTextTrigger.Start,
                BookmarkName = null,
                TextOffset = -1,
                TextLength = -1,
            };
    
            public static ReadTextEvent End { get; } = new ReadTextEvent()
            {
                Trigger = ReadTextTrigger.End,
                BookmarkName = null,
                TextOffset = -1,
                TextLength = -1,
            };
    
            public ReadTextTrigger Trigger { get; set; }
            public string BookmarkName { get; set; }
            public int TextOffset { get; set; }
            public int TextLength { get; set; }
    
            /// <summary>Utility methods to pre-initialize some fields of this object.</summary>
            public static ReadTextEvent Factory(ReadTextEvent src)
            {
                return new ReadTextEvent()
                {
                    Trigger = src.Trigger,
                    BookmarkName = src.BookmarkName,
                    TextOffset = src.TextOffset,
                    TextLength = src.TextLength,
                };
            }
    
            public static ReadTextEvent BookmarkEvent(string bookmark)
            {
                return new ReadTextEvent()
                {
                    Trigger = ReadTextTrigger.Bookmark,
                    BookmarkName = bookmark,
                    TextOffset = -1,
                    TextLength = -1,
                };
            }
    
            public static ReadTextEvent WordEvent(int textOffset, int textLength)
            {
                return new ReadTextEvent()
                {
                    Trigger = ReadTextTrigger.Word,
                    BookmarkName = null,
                    TextOffset = textOffset,
                    TextLength = textLength,
                };
            }
    
            private ReadTextEvent()
            {
            }
        }
    
        /// <summary>
        /// A SpeechDelegate is passed to the ITtsVoice.Speak() method, so that the caller may receive progress info as the text is being spoken.
        /// </summary>
        /// <param name="speechText"></param>
        /// <param name="readTextEvent"></param>
        public delegate void SpeechDelegate(string speechText, ReadTextEvent readTextEvent);
    
        /// <summary>
        /// This class encapsulates everything necessary to invoke a SpeechDelegate.
        /// A SpeechCallback instance may be created each time a new string is enqueued for speaking,
        /// and then invoked multiple times throughout the process, with an updated ReadTextEvent.
        /// </summary>
        public class SpeechCallback
        {
            private readonly SpeechDelegate _speechDelegate;
    
            public SpeechCallback(string text, SpeechDelegate speechDelegate)
            {
                Text = text;
                _speechDelegate = speechDelegate;
            }
    
            public string Text { get; }
    
            public void Invoke(ReadTextEvent readTextEvent) => _speechDelegate?.Invoke(Text, readTextEvent);
        }
    }



    • Edited by J Nelson Tobii Dynavox Tuesday, November 13, 2018 12:38 PM Added comment about TimedMetadataTrack events and windows sample
    Monday, November 12, 2018 2:53 PM

All replies

  • Hi,

    Could you please share me the reproduce sample? I could not find out the problem with your code snippet. You could upload your sample to OneDrive and then share me the link. A reproduce sample could help to take a look at the problem.

    Best regards,

    Roy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, November 13, 2018 6:02 AM
  • Sure thing, here you go: https://1drv.ms/u/s!Arq2PjAsyTrJhHQ9gXURWmb7JErw 
    Tuesday, November 13, 2018 12:42 PM
  • Hi,

    Sorry for the delay. I'm asking another engineer to help with this. There might be some time delay.

    Best regards,

    Roy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, November 16, 2018 10:05 AM
  • Have you verified that you have a one for one subscribe/unsubscribe? It appears that you are subscribing by index, so what might be happening is that one index gets replaced and you never unsubscribe from the one that got replaced. Just a guess without running the code.

    So subscribe to item[0], item[0] gets replaced with a new value, and you try unsubscribing from item[0] but there really isn't a handler on that one and the old one got orphaned.

    Just a thought.

    T.J.


    Thomas Mullen

    Saturday, November 17, 2018 4:02 AM