How is the EVR's video mixer ( VMR9 wrapped in an MFT ) handling the frame rate of a substream ? RRS feed

  • Question

  • Hi

    I would like to know the exact behavior of Microsoft's Direct3D 9 aware Video Mixer MFT ( which can be created using the MFCreateVideoMixer function ) when it comes to handling the frame rate of a substream. This question is directed towards the developers of Microsoft as only they know the source code. I saw in another question here in the forum that Becky Weiss was able to contact Vlad who seem to be the guy who wrote the EVR. That gave me hope to write my question and maybe get an answer.

    You might ask yourself why i would need to know such in depth detail of the mixer. Well, i developed a lot of Direct3D 11 aware MFTs and sources over the years and was just about to finish a Direct3D 11 Video Mixer, which is 100 % compatible with the EVR. To design a video mixer which is behaving exactly the same way as Microsoft's own i developed a test bench. That test app alone is 20k-30k lines of code. I would say nobody ever tested Microsoft's Video Mixer that deep. Even though i was able to find out almost evrything the mixer does, there is an important detail which cannot be resolved without reverse engineering. And since that is illegal, and because it is a DXVA / Direct3D 11 Video API thing anyway, i will ask the question here in the forum.


    My observations of Microsoft's EVR mixer, about handling the frame rate of a substream, are the following :


    If a substream has a higher frame rate than the reference stream, then the frame rate gets downsampled with DXVA. That means the OutputFrameFreq member of the DXVA2_VideoDesc structure will be set to the frame rate of the reference stream. This is done in the SetInputType method of the MFT when setting a media type for a substream. If the video processing device is not offering frame rate conversion, then method will fail. 


    If a substream has a lower frame rate than the reference stream, then the frames are getting buffered in a list. The size of that list will be equal to the result of dividing the reference stream frame rate by the substream frame rate ( rounded value ). A frame will be taken for rendering from that buffer list according to the sample time and duration measured against the former substream sample and the current reference stream sample. That way the frames are getting passed in a correct way, as if the frame rate was upsampled, to the video processing device.


    There is plenty of DXVA documentation in the MSDN, even for frame rate conversion. But it is nowhere meantioned if downsampling and upsampling should be used the same way in DXVA / Direct3D 11 Video API. From my observation Microsoft's Video Mixer does not use upsamplig in DXVA at all. It just buffers and uses the substream samples according to the reference stream sample time.

    The reason might be that the VMR9 mixer comes from an area where media coding wasnt very matured and most drivers either lacked frame rate conversion or did not offer a wide sprectrum of it. Of course this is just guessing and cannot be determined, there is also nothing about it in the MSDN.

    My hope is that a developer from Microsoft could give me either an insight of the Video Mixer MFT or clarify how to handle the frame rate of a substream in DXVA / Direct3D 11 Video API.

    Kindly regards,


    Monday, June 4, 2018 2:05 PM