RGBA ( source ) to BGRA ( WMF / DSHOW ) RRS feed

  • Question

  • Hi,

    is there a reason why the uncompressed RGB32 formats in WMF and DSHOW are BGRA in memory while outside there are lot of commonly used RGBA formats like in DXGI ?

    I managed to swap channels with some memcpy, _rotr and bitshift operations. However...my live encoding of a backBuffer is droping now 20 Frames instead of my usual 5.

    The Problem is that i have RGBA format on the IDXGISurface/ID3D11Texture2D i want to encode with the sink writer.

    In D3D9 i am using MFCreateDXSurfaceBuffer and then just add this buffer to a sample and send it to the sink writer. For D3D9 you dont have to do any format conversions, all formats are automaticaly handled by the Microsoft source code internally.

    Now...when using MFCreateDXGISurfaceBuffer it seems you have to write all the conversions yourself. The finalized video is complete but only the audio exist in it, the videoframes are only black. I didnt tested all DXGI_FORMATs but it looks like only a very few of them are supported when encoding from a buffer created by MFCreateDXGISurfaceBuffer.

    Why is there so well rounded support for Direct3D9, but for DXGI its practicaly none existent ? So if M$ has some format conversions in their source code why is there no mentioning anywhere in the documentary about what formats are supported and what not ?

    MFCreateDxgiSurfaceBuffer has 2 main problems :

    1) minimum supported client for this function is Windows 8

    2) as said its also failing on most DXGI_FORMATs, even on Windows 8 ! ... 

    Any other way to encode DXGI_FORMATs without manualy shifting the bytes ?



    • Edited by Francis Grave Thursday, October 31, 2013 2:48 PM edit
    Thursday, June 27, 2013 5:22 PM

All replies

  • The reason is history.  Let's say you have ARGB in a 32-bit register.  When a little-endian processor stores that in memory, the bytes become B G R A.  So, that was the format established by IBM for OS/2's bitmaps, which is what Microsoft used as the basis for the DIB.  The graphics manufacturers followed that.

    Graphics chips for Unix workstations sometimes used A R G B ordering, because those processors were mostly big-endian.

    It shouldn't cost you 15 frames per second to swap this.

    for( y = 0; y < height; y++ )
     for( x = 0; x < width; x++ )
        *pDst++ = pSrc[2];
        *pDst++ = pSrc[1];
        *pDst++ = pSrc[0];
        *pDst++ = pSrc[3];
        pSrc += 4;

    Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc.

    • Marked as answer by Francis Grave Thursday, June 27, 2013 10:35 PM
    • Unmarked as answer by Francis Grave Thursday, October 31, 2013 12:43 PM
    Thursday, June 27, 2013 5:53 PM
  • > You have to see, i am not capturing the backBuffer from a self
    > written app, rather i have a multihack/hook into games and
    > capturing their backbuffer in Direct3D 9, 10 and 11.

    Ah, so you are reading from the frame buffer.  Although the memory path from system-memory-to-device-memory is highly optimized, the path from device-memory-to-system-memory is not commonly used, so it does not get as much attention, and is often quite a bit slower, especially for byte access.  It can sometimes be quicker to copy the whole buffer to system memory, manipulate there, then copy back.

    Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc.

    Monday, July 1, 2013 6:23 PM
  • I am sorry Tim,

    but i have to unmark your answer now. Of course your byte shoveling works but...it is unacceptably slow.

    I finished the basic work of my app, but i forgot to switch between Direct3D modes when testing my app lately. Basicaly i was working only on Direct3D 11 past 3 months and thought that the encoding performance is well enough but...now after i tested the encoding of a Direct3DSurface9 again i saw a huge difference between the encodings.

    My app live encodes the backbuffer of games and the system audio. For latest testing i am using a older game which fully supports DirectX 9, 10 and 11. These are the framerate results when the resolution is set to 1080p and all quality settings to maximum ( note that this is not encoding with a hardware MFT, its only Software/CPU ) :

    Direct3D 9   :    80 FPS without recording, and 65-70 with my app recording/encoding

    Direct3D 11 :    80 FPS without recording, and 20 with my app recording/encoding

    So...20 FPS is inacceptable i thought and went back to testing and refining, after half day i traced it down to this code block :

    const unsigned char* pSrc = static_cast<const unsigned char*>(subresource.pData);
    for(unsigned int y = 0; y < VIDEO_HEIGHT; y++)
    			for(unsigned int x = 0; x < VIDEO_WIDTH; x++)
    				*pDest++ = pSrc[2];
    				*pDest++ = pSrc[1];
    				*pDest++ = pSrc[0];
    				*pDest++ = pSrc[3];
    				pSrc += 4;

    When i replace that code block with my formerly used funtion

    hr = MFCopyImage(pDest, cbWidth, (BYTE*)pSubresource.pData, cbWidth, cbWidth, VIDEO_HEIGHT);

    the framerate result is :

    Direct3D 11 :    80 FPS without recording, and 65-70 with my app recording/encoding

    So its basically identical performance as in DirectD3 9 then, but of course the colors in the video are wrong. I searched trough the web and found that that a lot of people having performance issues using this kind of byte shoveling.

    Now i have to find another solution to get the channel order right. As i stated in my post above, for Direct3D 9 you dont need to write any conversion functions, you just call MFCreateDXSurface and then add the recieved buffer to a IMFSample which you then can send to the sink writer. In Direct3D 10 and 11 however MFCreateDXGISurfaceBuffer is not working ( using Windows 8 ) and you have to work around this with copying the resource and mapping a subresource from which you then can copy to a IMFMediaBuffer.

    I wish Microsoft would shed some light on why there is so good support for encoding a Direct3DSurface9 but for encoding a ID3D11Texture2D its praticaly none existent.

    If anyone has an idea how to do fast format conversions when encoding a Direct3D 10 or 11 texture with RGBA format in media foundtaion please let me know. Of course i am working on it right now with priority one, and if i find out i will post it.



    • Edited by Francis Grave Tuesday, November 12, 2013 2:30 PM edit
    Thursday, October 31, 2013 1:50 PM
  • I got a bit additional speed in D3D11 by using bit shift operations and optimizing the loop, but somehow the fps increase only happens in game menus not in the actual gameplay. I think its because you have motion and different data from frame to frame when playing while in the menu its like a still Image.

    After i went over the MFCopyImage documentation again i recognized that it says SSE2 optimized "non-temporal" copy function. When i read deeper into that i found out about the MOVENT orders in Assembler and that it can be significantly faster when not using the cache for specific write operations.

    Reading into MMX, SSE, and SSE2 Intrinsics at the moment and it seems that i can easy write the appropriate function myself.



    Tuesday, November 12, 2013 7:35 AM
  • I am now able to write my own non-temporal store functions using SSE2 instructions.

    I doubled the fps but still thats not much if you look at 20 fps before and now at 40-45 fps with my new function. Why is that not much you might ask, well...if i am using MFCreateDXGISurfaceBuffer or MFCreateDXSurfaceBuffer with other formats i get around 75 fps. Without any encoding the game runs at 95 fps at 1080 maxed out, so you can clearly see an advantage when using the pre-build functions from Media Foundation. When using MFCopyImage instead of MFCreateDXGISurfaceBuffer or MFCreateDXSurfaceBuffer its also around 65-70.

    I am refining my conversion loop and investigating further into other approaches



    Monday, November 18, 2013 2:03 PM