Is Media Foundation Transform designed for real-time sound processing ? RRS feed

  • Question

  • Hi!

    I have MFT that computes FFT(fast fourier transform) for input samples. Also I have simple logic for visualisation and I noticed that FFT data updates not frequently as needed, approximately half a second. And there is not problems with performance. The problem is with MFT that transforms samples beforehand and almost always processes input samples that have duration almost like half a second (for example 21888 samples per channel for audio with 44100 sample rate). So is there a way to decrease duration of input samples (for example 1024 or 512 samples per channel) ?

    Tuesday, August 5, 2014 2:19 PM

All replies

  • Hi Alexander,

    I will find someone who familiar with MFT to help you with your issue. It may take few days and hope you can be patient. Thanks for your understanding.


    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.

    Wednesday, August 6, 2014 8:34 AM
  • Hello Alexander,

    Have you tried using _MFT_OUTPUT_STREAM_INFO_FLAGS? I think this is what you are looking for.





    Windows SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Wednesday, August 6, 2014 9:02 PM
  • I have the opposite problem in my FFT; I get 23ms every time with an odd number of bytes (as in: n % 2 = 1), no matter what I set in IMFTransform::GetInputStreamInfo (which, based on the documentation, tells the application what you are expecting in the input stream).

    MP3 files produce very odd results: No MF_MT_AUDIO_BITS_PER_SAMPLE, MF_MT_AUDIO_BLOCK_ALIGNMENT is 1 (which makes no sense for a 16bit stream).

    I am now wondering what we are both doing wrong. Maybe a combination of what you are doing and what I am doing will work. Here is my IMFTransform::GetInputStreamInfo:

    HRESULT FFT::GetInputStreamInfo(
        DWORD dwInputStreamID,
        MFT_INPUT_STREAM_INFO *pStreamInfo
        if (pStreamInfo == NULL)
          return E_POINTER;
        AutoLock lock(m_critSec);
        if (!IsValidInputStream(dwInputStreamID))
        pStreamInfo->hnsMaxLatency = 0;
        pStreamInfo->dwFlags = MFT_INPUT_STREAM_WHOLE_SAMPLES
        if (m_pInputType == NULL)
          pStreamInfo->cbSize = 0;
          pStreamInfo->cbSize = 4;
        pStreamInfo->cbMaxLookahead = 0;
        return S_OK;

    I want the MF to send me a group of 4 byte samples to be processed immediately; one sample per channel and enough to run a proper FFT.

    I have tried:

    1. setting cbAlignment to 4 (bytes)
    2. setting cbSize to: block align (which is always 1), 4096, 4096 * number of channels, 512, 1024, 2048
    3. setting cbFlags to every combination of flags I could run without exceptions being thrown.

    The only thing I have successfully accomplished in this MFT is copying the input stream to the output stream one byte at a time.

    Wednesday, August 13, 2014 3:16 PM
  • Hello,

    I agree that makes no sense. What input and output media types are you supporting in your MFT? It sounds to me like you are getting added to the topology before the decoder and receiving the compressed data. Try only supporting PCM data for your input and output.

    I hope this helps,


    Windows SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Thursday, August 14, 2014 1:02 AM
  • My MFT supports only PCM input, namely 16 bit signed integer and 32 bit float.

    I recently noticed that GetInputStreamInfo method is never called. So this means that I can't tell how much data I want to process ?

    You can check this by running



    Test video

    set breakpoints on first line of GetInputStreamInfo and GetOutputStreamInfo methods inside  MediaExtensions\MFExtensions\SimpleAudioEffects\SimpleAudioEffects.Shared\MuteEffect.cpp class, set "Mixed" or "Native" debugging type in MediaExtensions.Windows app properties, then run app and go to "Local decoder", click "Play video", open Test video(app will properly render only sound but this will be enough) . After this only  GetOutputStreamInfo breakpoint will be hitted.  So please answer to my question  "So this means that I can't tell how much data I want to process ?" .

    Thursday, August 14, 2014 6:41 AM
  • @Alexander

    I will take a look and let you know what I find. A couple questions for you:

    In ProcessInput, what is the duration of your IMFSample? Mine is currently 182ms and I cannot get it to go any lower (like 32ms).

    What sample duration are you looking for?


    I had accidently left support for AAC and MP3 while I was trying to figure out the source of an exception. I removed all but PCM and now I get a 182ms IMFSample in ProcessInput. It's the correct data, but we both need shorter samples. I can break the data down and queue it, but it would solve our duration issues if you know of any way to limit the sample duration.

    • Edited by Inclement Death Thursday, August 14, 2014 9:01 PM forgot to reply to @James post
    Thursday, August 14, 2014 8:47 PM
  • Currently on test video I am getting around 25ms samples(near 8kbytes for 2 channes float samples with 48kHz sample rate), but if I modify Scenario1 file picker filter and add for example .mp3 files and then test MuteEffect on different mp3 I will get 336ms samples(near 118kbytes for 2 channes float samples with 48kHz sample rate).

    What sample duration are you looking for?

    I need small enough duration near 1024 samples, i.e. for 2 channel float 48 kHz it will be 21ms or 8192 bytes. But of course I need to be sure that I will always get such duration.

    Friday, August 15, 2014 8:41 AM
  • The IMFTransform::ProcessInput IMFSample parameter shows the timestamp and sample duration (time (ms), duration (ms)). "time" is the timestamp of the first sample and "duration" is how many milliseconds are included in the IMFSample.

    For example: The first call to my IMFTransform::ProcessInput gives me an IMFSample that has a time[stamp] of 0.0 and a duration of 182.857101. I'm trying to get 32ms duration rather than the 182ms duration, but I may have to settle for breaking the IMFSample buffer into ~30ms chunks and pushing them into a FIFO queue (for async processing later) if I can't figure out how to get the 32 ms duration I want.

    I took a look at MuteEffect and could not get it to call the GetInputStreamInfo method. I even went as far as rewriting it to match my IMFTransform (with your muting code instead of my FFT). I can't seem to make it work either.

    Are there ANY MFT experts in here that can assist us?

    If we can't get an answer here, I'll create a generic pass-through IMFTransform and a wrapped singleton (singleton class referenced by both the IMFTransform and a public ref class consumable by WinRT) containing a FIFO queue and a "ProcessBuffer" method that can be called by the referencing project. That way, I can set a "preferred" duration and the IMFSample buffer can be divided into chunks of the proper duration that we can grab from the singleton when we need them (probably using a dispatcher timer in the UI). I will post a link to the prototype when I complete it.

    I apologize for hijacking your thread.

    Friday, August 15, 2014 4:10 PM
  •   Up
    Tuesday, September 23, 2014 2:56 PM