How do I create a IMFMediaSource from a PCM stream. RRS feed

  • Question

  • Hi,

    I have configured a MediaSession with a custom topology and have gotten it to work for outputting sound files.  Now I want to take a stream from ISpVoice (SAPI) and run the stream through the same topology I have already created.  I am not sure how to create a IMFMediaSource from a IStream when the IStream is just a plain PCM format.

    Here is the code I have so far. (I have removed error checking and release calls to shorten the example.)

    ISpVoice* pVoice;   //SAPI voice

    HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice);

    ISpStreamFormat* voiceStream;

    hr = pVoice->GetOutputStream(&voiceStream);

    IMFByteStream* byteStream;

    hr = MFCreateMFByteStreamOnStream(voiceStream, &byteStream);

    IMFSourceResolver* pSourceResolver;

    IUnknown* pSource;

    hr = MFCreateSourceResolver(&pSourceResolver);

    // I think this is where the problem is.  How do I tell the source resolver that 

    // it is a raw PCM stream.  

    // this next call returns MF_E_UNSUPPORTED_BYTESTREAM_TYPE currently.

    hr = pSourceResolver->CreateObjectFromByteStream(
                NULL, // URL
                MF_RESOLUTION_MEDIASOURCE, // Create a source object.
                NULL, // Optional property store.
                &ObjectType,        // Receives the created object type. 
                &pSource            // Receives a pointer to the media source.

    IMFMediaSource **ppSource;

    hr = pSource->QueryInterface(IID_PPV_ARGS(ppSource));

    So what am I doing wrong?  Is there a better way to approach this problem?

    Thanks for your help,


    • Edited by JeffGram Monday, October 26, 2015 5:08 PM
    Monday, October 26, 2015 4:57 PM

All replies

  • Hi Jeff

    I am not familiar with SAPI, but i can provide a solution to that problem. You could write your own source implementation and register it for the Byte Stream Handler, that will definitly work. Media Foundation struggled with Wave sources in the past, though it must be said it was at the very beginning of the API, nowadays it should be no problem.

    The Error sounds like the format is not supported and you should take it as that. If your stream is realy just plain PCM and not some raw Float data then it looks to me like it is not supported. Can you give us the GUID and the WAVEFORMATEX from GetFormat ? Is the ISpStreamFormat an IStream compatible class ? Does it inherit from IStream ?

    If PCM is not supported from stream you have to implement your own custom Media Source. The best way of doing it is to modfiy the "WaveSource" Sample i guess. The "MPEG1Source" Sample is far better and more efficient but for your scenario the WaveSource sample should be sufficient. There are tons of samples around PCM, for reading and writing files, because as i said in the early beginning standalone WAVE/PCM streams were not supported.



    Tuesday, October 27, 2015 4:55 PM
  • Thanks for the reply,

    The SAPI is microsoft's Text to Speech api.  Yes the ISpStreamFormat inherits from the IStream interface.

    The GUID is equal to SPDFID_WaveFormatEx I have also included the information from the debugger below.

    + guid {C31ADBAE-527F-4FF5-A230-F62BB61FF70C} _GUID

    - format 0x018fbf20 {wFormatTag=1 nChannels=1 nSamplesPerSec=16000 ...} tWAVEFORMATEX *

    wFormatTag 1 unsigned short
    nChannels 1 unsigned short
    nSamplesPerSec 16000 unsigned long
    nAvgBytesPerSec 32000 unsigned long
    nBlockAlign 2 unsigned short
    wBitsPerSample 16 unsigned short
    cbSize 0 unsigned short

    I have already started looking a the MPEG1Source example.  Thanks for pointing me to the simpler WaveSource example.  I'm just surprised that Microsoft doesn't provide a way to connect a stream from one of its API to another one of its API's.  I could also write a wav file from the Text to speech api and then read it into the Media Foundation api but seams like a waste of resources and time.

    Thanks for you help,


    Wednesday, October 28, 2015 3:07 PM
  • Since the format is WAVE_FORMAT_PCM i think we can assume its not supported. Well you could do a test with a 2 channel ( Stereo ) format to make sure.

    Storing and reading PCM in/from a Video container is widely supported in Media Foundation, but as said standalone Wave Streams/Files are not. Thats why there are the "WaveSink" and "WaveSource" samples.



    Saturday, October 31, 2015 2:31 PM
  • Hello,

    Thanks for the help.

    I implemented the WaveSource example and it work for outputting the sound.  The only problem I am having now is that I am trying to determine when the sound has finished playing.    Right now I am watching the MEEndOfPresentation and MESessionEnded event and both of them are occurring before the sound starts playing.  Is there another event I should be waiting for to know when the sound is done playing or is there something I could be doing wrong in my WavSource example which I modified. The MEEndOfPresentation event seems to work just fine when I use the standard one used to play files.  It only seems to be a problem with the Media Source which I wrote.


    Wednesday, November 4, 2015 3:18 PM
  • Sources are categorized in 2 groups. There are the File/Stream/Network Sources and the Device Sources. For the former named there are the MFCreateSourceReaderFrom**** functions. For the later there are functions like MFEnumDeviceSources and MFCreateDeviceSource for example.

    Media Sources can have different characteristics. A Source could be static or live for example. It can allow or disallow any of the implemented functions like Start, Stop or Pause ( in Start disallow Seeking ). In a custom source implementation it is important to expose the correct flags in the "IMFMediaSource::GetCharacteristics" method. If you writing a "Live Source" you have to expose the MFMEDIASOURCE_IS_LIVE flag. See the last section of the "Writing a Custom Media Source" documentation. Its up to you what methods are allowed. It depends on what type of source you implement. An example : If you dont want to allow Pause you simply omit the flag and return E_NOTIMPL in the Pause method.

    On a File Source you would normaly queue MEEndOfStream in the Streams Event Queue when the stream delivered the last sample. In a Live Source there is no MEEndOfStream. Well you could send it on Stop or Shutdown or so, its realy up to you, but normaly you dont need this event in a Live Source. Also it would not work if you have custom implemented sources combined with some non-custom Sources ( like Device Sources for example ) in a Topology, because you cannot control what the non-custom sources send and what not. I opened a Question few posts before yours here, as i just encountered exactly that problem. If your source is live and is the only source in the Topology you can send that event whenever you want, it should work.



    Wednesday, November 4, 2015 5:56 PM
  • In my case I am dealing with a static source.  The stream that I am pulling the data from has an end and  when I reach the end I send the MEEndOfStream event.

    I am not mixing my custom source with any other source.  My topology is "custom source"->"Resampler Node"->"Speaker sink".  It looks like when my custom media source is sending a MEEndOfPresentation event when the end of stream has been reached the session manager then sends an MESessionEnded event after this before the sound has played through the system.  Should I be triggering the MEEndOfPresentation event off of something else.

    If I switch to a normal topology without my custom source then I get the MEEndOfPresentation and the MSSessionEnded after the sound has finished playing.



    Wednesday, November 4, 2015 6:35 PM
  • That sounds weird. Do i understand you correctly :

    You deliver samples on each request but you hear nothing ? Then when all samples are fed and the end of the stream has been reached your Session gets the MEEndOfPresentation , then you get an MESessionEnded, and then the Sound plays ? Or do you mean the MESessionEnded comes while the Sound still plays ?

    And what do you mean by triggereing the MEEndOfPresentation Event off of something else ?

    It works this way :

    For each enabled stream in a Presentation you send an MENewStream or MEUpdatedStream. The Source ( Aggregated or not ) or the Media Session ( since you pass each stream pointer with MENewStream ) is listening to all Streams Event Queues. If all streams have send an MEEndOfStream the MEEndOfPresentation gets fired. If all Presentations/Topologys that are queued have send an MEEndOfPresentation then the you get an MESessionEnded.

    It sounds like there is still something wrong in your Source.

    • Edited by Francis Grave Wednesday, November 4, 2015 8:24 PM Edit
    Wednesday, November 4, 2015 8:14 PM