locked
incorrect video frame size returned by IMFSourceReader RRS feed

  • Question

  • Hi, I'm using IMFSourceReader to read an H.264 encoded MP4 video file. 

    Firstly I set the output media type to IYUV so that decoder will be loaded and I'll get uncompressed video frames.

    Resolution of the test video is 640x360 and from the media type returned by GetCurrentMediaType, I can see stride is 640 and pvih->bmiHeader.biSizeImage = 345,600, that's correct because IYUV has 1.5 B/p, so 640x360x1.5 == 345,600.

    But when I check the size of the video samples, buffer size seems to be a bit larger - 353,280. Could anyone know why it's not 345,600?

    My code is below, I've removed all error checking code to make it clear.

    #include "stdafx.h"
    #include <Winerror.h>
    #include <Objbase.h>
    #include <Mfidl.h>
    #include <Mfapi.h>
    #include <Mfreadwrite.h>
    #include <Dvdmedia.h>
    #include <Strmif.h>
    #include <uuids.h>
    #include <Dshow.h>
    int _tmain(int argc, _TCHAR* argv[]) {
        const WCHAR *szFilePath = L"C:\\Users\\hxuan\\Downloads\\big44100.mp4";   
        HRESULT hr = CoInitializeEx(0, COINIT_MULTITHREADED);  
        do {
            hr = MFStartup(MF_VERSION, 0);
            // Get reader      
            IMFSourceReader *pReader;      
            hr = MFCreateSourceReaderFromURL(szFilePath, NULL, &pReader);      
            // Enum video media types       
            int i = 0;      
            do {         
                IMFMediaType *pType = NULL;
                hr = pReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, i++, &pType);         
                if (SUCCEEDED(hr)) {
                    AM_MEDIA_TYPE* pmt;            
                    hr = pType->GetRepresentation(FORMAT_VideoInfo2, (void**)(&pmt));      
                    if (SUCCEEDED(hr)) {              
                        VIDEOINFOHEADER2* pvih = (VIDEOINFOHEADER2*)(pmt->pbFormat); 
                        if (pmt->subtype == MEDIASUBTYPE_H264) {    
                            printf("it's H.264 video, resolution = %dx%d\n", pvih->bmiHeader.biWidth, pvih->bmiHeader.biHeight); 
                        } 
                   }            
                   // Set output to yuv so that decoder can be inserted            
                   IMFMediaType* desiredType = NULL;            
                   hr = MFCreateMediaType(&desiredType);
                   // major type video            
                   hr = desiredType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);            
                   // sub type IYUV            
                   hr = desiredType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_IYUV);            
                   // Set current media type            
                   hr = pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, NULL, desiredType);            
    
                   // Check output type            
                   IMFMediaType* actualType = NULL;
                   hr = pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &actualType);            
    
                   // Get stride            
                   LONG lStride = 0;            
                   hr = actualType->GetUINT32(MF_MT_DEFAULT_STRIDE, (UINT32*)&lStride);            
                   printf("default stride = %d\n", lStride);    
    
                   hr = actualType->GetRepresentation(FORMAT_VideoInfo2, (void**)(&pmt));      
                   VIDEOINFOHEADER2* pvih = (VIDEOINFOHEADER2*)(pmt->pbFormat);            
                   printf("width = %d, height = %d, image size = %d\n", pvih->bmiHeader.biWidth, pvih->bmiHeader.biHeight, pvih->bmiHeader.biSizeImage);     
           
                   // Now to read a frame            
                   IMFSample* pSample = NULL;            
                   hr = MFCreateSample(&pSample); 
               
                   DWORD streamIndex, flags;
                   LONGLONG llTimeStamp;            
                   hr = pReader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM,    // Stream index.               
                       0,                              // Flags.
                       &streamIndex,                   // Receives the actual stream index.                
                       &flags,                         // Receives status flags.               
                       &llTimeStamp,                   // Receives the time stamp.               
                       &pSample                        // Receives the sample or NULL.               
                       );            
    
                   IMFMediaBuffer* pBuf = NULL;            
                   hr = pSample->ConvertToContiguousBuffer(&pBuf);            
    
                   DWORD bufLength;            
                   pBuf->GetCurrentLength(&bufLength); // Get the length (in Bytes) of output buffer            
                   printf("current buffer length = %d\n", bufLength);            
    
                   BYTE* pByteBuffer;            
                   DWORD buffCurrLen = 0;            
                   DWORD buffMaxLen = 0;            
    
                   hr = pBuf->Lock(&pByteBuffer, &buffMaxLen, &buffCurrLen);            
                   printf("current length = %d, max length = %d\n", buffCurrLen, buffMaxLen);         
               }
           } while (SUCCEEDED(hr));   
        } while (false);   
    
        printf("exit -- fail-----------------\n");	
        return 0;
    }

    When above program runs, the output is

    The test video is here https://www.dropbox.com/s/ueaqpojzbjxo2ij/big44100.mp4?dl=0

    Thanks for help.

    Thursday, July 6, 2017 3:24 AM

All replies

  • The buffer is presumably extended to 640x368 but this does not necessarily means the data inside is incorrect. It might so happen that the buffer is extended for respective alignment, 640x360 data is in good standing inside, but Video Processing MFT put incorrect data size. Or, another possible cause is that H.264 itself is likely to be 640x368 with crop of bottom lines, and somehow the cropping was dropped on the way to you.

    http://alax.info/blog/tag/directshow

    Thursday, July 6, 2017 6:05 AM
  • Hi Roman,

    I dumped out the raw data to a file and opened it with a yuv viewer.

    dump code:

                ofstream dump;   
                dump.open("c:\\temp\\dump\\fulldump.yuv");        
                dump.write((const char*)pByteBuffer, buffCurrLen);            
                dump.close();     
           
                dump.open("c:\\temp\\dump\\correctsize.yuv");    
                dump.write((const char*)(pByteBuffer + (buffCurrLen - pvih->bmiHeader.biSizeImage)), pvih->bmiHeader.biSizeImage);            
                dump.close();

    see the 2nd file starts from pBuffer + offset (actual size - expected size).

    Here is the comparison of 2 images.

    It's full image comparison, you can see not only the top lines, but also the color on left image is incorrect.

    Y component:

    Y component seems good for both 

    U on left is incorrect somewhat

    V on left is also somewhat incorrect

    Look at the incorrect (size 353280 > 640x360x1.5) image

    Look at the edge of the color blocks around the root of the tree. Seems U and V are downshifted some what. 

    remember that the correct size image is correctly decoded (right side of 1st comparison) but obviously top part is cut (

    you can see it from the position of "Big Buck").


    Thursday, July 6, 2017 6:26 AM
  • Hi Roman,

    I dumped out the raw data to a file and opened it with a yuv viewer.

    dump code:

                ofstream dump;   
                dump.open("c:\\temp\\dump\\fulldump.yuv");        
                dump.write((const char*)pByteBuffer, buffCurrLen);            
                dump.close();     
           
                dump.open("c:\\temp\\dump\\correctsize.yuv");    
                dump.write((const char*)(pByteBuffer + (buffCurrLen - pvih->bmiHeader.biSizeImage)), pvih->bmiHeader.biSizeImage);            
                dump.close();

    see the 2nd file starts from pBuffer + offset (actual size - expected size).

    Here is the comparison of 2 images.

    It's full image comparison, you can see not only the top lines, but also the color on left image is incorrect.

    Y component:

    Y component seems good for both 

    U on left is incorrect somewhat

    V on left is also somewhat incorrect

    Look at the incorrect (size 353280 > 640x360x1.5) image

    Look at the edge of the color blocks around the root of the tree. Seems U and V are downshifted some what. 

    remember that the correct size image is correctly decoded (right side of 1st comparison) but obviously top part is cut (

    you can see it from the position of "Big Buck").


    Hi guys, I found that if I open the larger size one as 640x368 then it can be decoded correctly.

    That means the incorrect U are actually caused by the extra 8 lines of Y, incorrect V are caused by U.

    So in order to get a 640x360 video frame, I need to remove extra Y U V data from the continuous buffer.

    Thursday, July 6, 2017 8:44 AM
  •               

    So I fed your file to one of my tools and here is what I see.

    1. You set partial media type

        Key MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
        Key MF_MT_SUBTYPE, vValue {56555949-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_IYUV, FourCC IYUV)

    2. Source Reader expanded it into full media type compatible to your request

        Key MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
        Key MF_MT_SUBTYPE, vValue {56555949-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_IYUV, FourCC IYUV)
        Key MF_MT_ALL_SAMPLES_INDEPENDENT, vValue 1 (Type VT_UI4)
        Key MF_MT_COMPRESSED, vValue 0 (Type VT_UI4)
        Key MF_MT_FRAME_SIZE, vValue 2748779069800 (Type VT_UI8, 0x00000280 0x00000168, 640 360)
        Key MF_MT_DEFAULT_STRIDE, vValue 640 (Type VT_UI4)
        Key MF_MT_PIXEL_ASPECT_RATIO, vValue 4294967297 (Type VT_UI8, 0x00000001 0x00000001, 1 1)
        Key MF_MT_INTERLACE_MODE, vValue 7 (Type VT_UI4)
        Key MF_MT_FRAME_RATE, vValue 42949672960417083 (Type VT_UI8, 0x00989680 0x00065D3B, 10000000 417083)
        Key MF_MT_FIXED_SIZE_SAMPLES, vValue 1 (Type VT_UI4)
        Key MF_MT_SAMPLE_SIZE, vValue 345600 (Type VT_UI4)
        Key MF_MT_AVG_BITRATE, vValue 511850 (Type VT_UI4)
        Key MF_MT_AVG_BIT_ERROR_RATE, vValue 0 (Type VT_UI4)
        Key MF_MT_VIDEO_ROTATION, vValue 0 (Type VT_UI4)

    3. Once you started reading, the first media sample comes with MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED flag. Source reader changed format to

        MF_MT_MAJOR_TYPE, vValue {73646976-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFMediaType_Video, FourCC vids)
        MF_MT_SUBTYPE, vValue {56555949-0000-0010-8000-00AA00389B71} (Type VT_CLSID, MFVideoFormat_IYUV, FourCC IYUV)
        MF_MT_ALL_SAMPLES_INDEPENDENT, vValue 1 (Type VT_UI4)
        MF_MT_COMPRESSED, vValue 0 (Type VT_UI4)
        MF_MT_FRAME_SIZE, vValue 2748779069808 (Type VT_UI8, 0x00000280 0x00000170, 640 368)
        MF_MT_DEFAULT_STRIDE, vValue 640 (Type VT_UI4)
        MF_MT_PIXEL_ASPECT_RATIO, vValue 4294967297 (Type VT_UI8, 0x00000001 0x00000001, 1 1)
        MF_MT_INTERLACE_MODE, vValue 7 (Type VT_UI4)
        MF_MT_FRAME_RATE, vValue 42949672960417083 (Type VT_UI8, 0x00989680 0x00065D3B, 10000000 417083)
        MF_MT_FIXED_SIZE_SAMPLES, vValue 1 (Type VT_UI4)
        MF_MT_SAMPLE_SIZE, vValue 353280 (Type VT_UI4)
        MF_MT_AVG_BITRATE, vValue 511850 (Type VT_UI4)
        MF_MT_AVG_BIT_ERROR_RATE, vValue 0 (Type VT_UI4)
        MF_MT_GEOMETRIC_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)
        MF_MT_MINIMUM_DISPLAY_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)
        MF_MT_PAN_SCAN_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)
        MF_MT_VIDEO_NOMINAL_RANGE, vValue 2 (Type VT_UI4)
        MF_MT_VIDEO_ROTATION, vValue 0 (Type VT_UI4)

    You ignored it and you lost understanding with Source Reader here.

    Apparently it changed output to 648x368. The data is valid but it's having this resolution from now on.

    As I suspected, H.264 Video Decoder MFT behind the scene aligned up the height to 16 px granularity, and lost the cropping. That is it operates in the following mode:

    • Input MFVideoFormat_H264, 640 x 360
    • Output MFVideoFormat_IYUV, 640 x 368

    You have to take this into consideration.

    Also I would expect that if you dump the buffer as 640x368 IYUV image, you will find a correct picture with 8 extra pixel rows at the bottom of the image.


    http://alax.info/blog/tag/directshow


    Thursday, July 6, 2017 9:54 AM
  • OK, this happens with RGB as well. Note the bottom 8 rows:


    http://alax.info/blog/tag/directshow

    Thursday, July 6, 2017 10:14 AM
  • Hi Roman, 

    Yes you are correct, dumping to 640x368 gets same picture as you get for the RGB format.

    If as you said 360->368 is because of alignment, that means if there is format change, new buffer size will always larger than old one.

    Does that mean the cropping should be done at my side? because you know, how to crop depends on the pixel format, data is planar or packed. Actually I've never used an API that requires users to crop the image buffer content.

    Friday, July 7, 2017 12:52 AM
  • H.264 encoding itself operates with 16x16 macroblocks, and hence the image is aligned respectively. The format has special fields that indicate necessary cropping which work as this: "the resolution is MxN macroblocks, but you will have to crop L, R, T, B pixels from sides respectively after the image is decoded".

    The decoder does not do cropping itself to minimize processing overhead. If receiver could crop then why do it in the decoder? Especially if pipeline parts like presenter are capable of displaying a fragment of the buffer without even doing the cropping.

    Note the decoder added several additional attributes:

    • MF_MT_GEOMETRIC_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)
    • MF_MT_MINIMUM_DISPLAY_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)
    • MF_MT_PAN_SCAN_APERTURE, vValue 00 00 00 00 00 00 00 00 80 02 00 00 68 01 00 00 (Type VT_VECTOR | VT_UI1)

    These are blobs with 4 32-bit values: 0, 0, 640, 360 (0x168). This way decoder notifies that payload is a part of the entire buffer and you are supposed to do the cropping as you need.

    The cropping method indeed is format dependent, however if you request IYUV then you could also find way to crop it? You have a few options here. If this is a texture, you could copy a subrect into another [esp. staging] texture. Or you could possibly use Video Processing MFT for this, or if you get your hands on bits of the data, you could copy 360 rows by planes: copy 640x360 bytes, then skip 640x8 bytes, then again copy (640/2)x(360/2) bytes, skip (640/2)x(8/2) and once again copy (640/2)x(360/2) bytes, skip (640/2)x(8/2). Or yet another option to use libswscale to do the same. Or, if you are dealing with RGB you could use one of non-MF Windows APIs.

    MF_MT_MINIMUM_DISPLAY_APERTURE attribute:

    Defines the display aperture, which is the region of a video frame that contains valid image data.


    http://alax.info/blog/tag/directshow


    Friday, July 7, 2017 5:10 AM