locked
Best way to capture video from a webcam device to a texture

    Question

  • I'm in the process of porting an existing application to Metro and I'm a bit stuck on the best way to capture video frames from a webcam to a texture.

    Our current code uses the Media Foundation APIs but critically the MFEnumDeviceSources call is not available in Metro style apps and I haven't found an alternative way of creating a IMFMediaSource from the webcam device in Metro.

    So instead I've been looking at the Windows.Media.Capture API. The only way I can see that might allow me to capture the webcam video to a texture is to use the MediaCapture.CapturePhotoToStreamAsync call and convert the PNG data to the appropriate texture format. This seems kind of inefficient though.

    I'm unsure what would be the best way of solving this problem under Metro using either Media Framework or Windows.Media?

    Thursday, May 17, 2012 11:15 PM

Answers

  • Hello Cameron,

    Unfortunately unlike the media playback engine we do not expose the capture engine in WinRT. We do ask that you use the MediaCapture APIs to do your media capture. You should be able to stream the incoming video data to a D3D texture by creating a custom C++ Cx MFT to plug-into the MediaCapture pipeline. This custom MFT can then siphon off the video stream and blt it directly to a D3D texture. This will certainly be much more efficient than using the CapturePhotoToStreamAsync call. The best MFT sample we have is the "greyscale" sample that can be found here:

    Media capture using webcam sample
    http://code.msdn.microsoft.com/windowsapps/Media-Capture-Sample-adf87622

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Friday, May 18, 2012 12:51 AM
    Moderator

All replies

  • Hello Cameron,

    Unfortunately unlike the media playback engine we do not expose the capture engine in WinRT. We do ask that you use the MediaCapture APIs to do your media capture. You should be able to stream the incoming video data to a D3D texture by creating a custom C++ Cx MFT to plug-into the MediaCapture pipeline. This custom MFT can then siphon off the video stream and blt it directly to a D3D texture. This will certainly be much more efficient than using the CapturePhotoToStreamAsync call. The best MFT sample we have is the "greyscale" sample that can be found here:

    Media capture using webcam sample
    http://code.msdn.microsoft.com/windowsapps/Media-Capture-Sample-adf87622

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Friday, May 18, 2012 12:51 AM
    Moderator
  • Thanks, that should help. In the sample, the "grayscale transform" is in a separate dll and the effect is registered by name in the appxpackage. Is this necessary? Reading the API docs I haven't found any other way of registering the effect mentioned.
    Friday, May 18, 2012 2:43 AM
  • Hello Cameron,

    I'm not sure I understand your question. Are you saying that you don't want to register the effect. I honestly can't think of a reason why you would not want to do this. Remember that your MFT will only exist within the context of your applicaiton and will not be availalbe to other applicaitons. Because of this there really isn't a reason to avoid registering the effect. The MFT must be in a separate DLL because this binary may be loaded into an external process.

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Friday, May 18, 2012 9:40 PM
    Moderator
  • I would rather avoid creating a dll because I use cmake for creating cross platform builds and it doesn't property support Metro yet so it's inconvenient adding new projects. I also have no intention of reusing the effect outside of the application I'm working on. Thus I'd rather just add the code for the write to texture effect to my existing project if possible.
    Sunday, May 20, 2012 9:04 PM
  • I have a few more follow up questions regarding this solution with regards to siphoning off the texture data. I'm not sure how to best tell the MFT plugin what texture to write to. As far as I can tell, the MFT instance is really hidden from the main application code. It is instanced through a factory that I don't have any control of, and I can't see any way in the media capture API that would allow me to get a pointer to the effect instance that I could use to control where the texture data could go.

    The only way I can really think of that I could access the texture written to by the MFT plugin would be to have the MFT class add itself to a static list on creation that I could then access from my application code.

    Is there a better way of doing this? The solution I've come up with here seems a bit kludgey.

    Tuesday, May 22, 2012 2:15 AM
  • Hello Cameron,

    Do you still need help with this?

    -James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Tuesday, June 05, 2012 10:01 PM
    Moderator
  • Yes we need.
    Monday, July 16, 2012 2:06 PM
  • I have a few more follow up questions regarding this solution with regards to siphoning off the texture data. I'm not sure how to best tell the MFT plugin what texture to write to. As far as I can tell, the MFT instance is really hidden from the main application code. It is instanced through a factory that I don't have any control of, and I can't see any way in the media capture API that would allow me to get a pointer to the effect instance that I could use to control where the texture data could go.

    The only way I can really think of that I could access the texture written to by the MFT plugin would be to have the MFT class add itself to a static list on creation that I could then access from my application code.

    Is there a better way of doing this? The solution I've come up with here seems a bit kludgey.

    Yes, I have the exact same question.

    Also I was looking at the MediaCapture API and MediaCapture.StartRecordToCustomSinkAsync looks appealing because you can actually pass it a pointer of an interface for the sink. Which you could then theoretically use to setup a callback to get data back. However it looks like StartRecordToCustomSinkAsync only supports compressed media formats, which doesn't seem helpful, if I'm interested in raw frames.

    Thursday, August 30, 2012 12:54 AM
  • I have a few more follow up questions regarding this solution with regards to siphoning off the texture data. I'm not sure how to best tell the MFT plugin what texture to write to. As far as I can tell, the MFT instance is really hidden from the main application code. It is instanced through a factory that I don't have any control of, and I can't see any way in the media capture API that would allow me to get a pointer to the effect instance that I could use to control where the texture data could go.

    The only way I can really think of that I could access the texture written to by the MFT plugin would be to have the MFT class add itself to a static list on creation that I could then access from my application code.

    Is there a better way of doing this? The solution I've come up with here seems a bit kludgey.

    Yes, I have the exact same question.

    Also I was looking at the MediaCapture API and MediaCapture.StartRecordToCustomSinkAsync looks appealing because you can actually pass it a pointer of an interface for the sink. Which you could then theoretically use to setup a callback to get data back. However it looks like StartRecordToCustomSinkAsync only supports compressed media formats, which doesn't seem helpful, if I'm interested in raw frames.

    Same +1

    How about RAW frames?   If the WinRT version is not well prepared yet , Can I use old Capture Interface (COM)?


    Win8 Developer QQ Group 95331609

    Sunday, September 16, 2012 5:54 PM
  • Implement the following class:

    class CGetSamples WrlSealed
    	: public RuntimeClass<
    	RuntimeClassFlags< Microsoft::WRL::RuntimeClassType::WinRtClassicComMix >,
    	ABI::Windows::Media::IMediaExtension,
    	IMFMediaSink,
    	IMFClockStateSink>
    {
    ...
    };

    Sample data will be delivered to the custom IMFStreamSink you write which is part of the IMFMediaSink.  The IMFSample data will be passed to IMFStreamSink::ProcessSample method.

    Then create an instance of this to pass to the StartRecordToCustomSinkAsync method:

    IInspectable* pInstpectable = nullptr;
    HRESULT hr = MakeAndInitialize<CGetSamples>(&pInstpectable);
    IMediaExtension^ me = reinterpret_cast<IMediaExtension^>(pInstpectable);
    MediaEncodingProfile^ mep = MediaEncodingProfile::CreateMp4(VideoEncodingQuality::Vga);
    // m_MC is an instance of MediaCapture
    m_MC->StartRecordToCustomSinkAsync(mep, me);

    The samples will not come out as RGB, they will likely be YUY2 or some similar format.  Since you can't set the IMFMediaType on the capture device or the source reader in order to get MF to do the conversion automatically, you are left to do it yourself.  To get RGB data you will have to create an ID3D11VideoProcessor with input views matching the YUY2 format and an output view that is RBG.  Then call the ID3D11VideoContext::VideoProcessorBlt  method.

    Get the ID3D11Resource interface by QIing for it off of the IMFMediaBuffer on each sample.  From this you can get the ID3D11Device.  QI for the ID3D11VideoDevice, then create the ID3D11VideoProcessorEnumerator, then the ID3D11VideoProcessor.  An ID3D11Texture2D with the RGB will have to be created as the destination and the ID3D11VideoProcessorOutputView wrapping it.  Create a ID3D11VideoProcessorInputView on the sample buffer.  Then Blit the sample to the surface using the video context and video processor.

    Tuesday, April 15, 2014 4:04 AM
  • Thank you Dieter for the directions. I was able to implement the custom media sync object and successfully used it in MediaCapture.StartPreviewToCustomSinkAsync() call. Everything works fine except one weird issue. In my setup, I use Lumia1020 back camera preview stream set to 1280x720 @ 30fps. In IMFStreamSink.ProcessSample() I grab NV12 frame via IMF2DBuffer.Lock2D(), copy it to ID3D11Texture2D and render it to the swapchain. This works fine as long as I point the camera to the dark scenes. Once I point the camera to bright objects the preview begins to lag for up to 0.5 sec. When I point the camera back to dark objects the delay decreases back to zero. A microbenchmark showed that it takes 2ms in bright and 30ms in dark between I request a new frame with MEStreamSinkRequestSample and receive the call in IMFStreamSink.ProcessSample(). The Lock2D() call takes about 29ms every time. Custom rendering takes about 5 ms. Achieved framerate was 30 fps in bright and 15 fps in dark. The question is simple. What can cause such a big video lag for bright scenes and how to solve it? I have couple guesses: 1. In bright, at 30fps, the MF thread (which is BTW priority 20) spends 29/(29+2+5)=80% of cpu in Lock2D() call, causing the capture engine to fail deliver recent frames. In dark, at 15fps, Lock2D() takes only 29/(29+30+5)=45% cpu which is enough for capture engine. 2. Maybe MF stores captured frames internally in the queue, so if I cannot keep up with pulling new frames fast enough I always get the tail of the queue which is 0.5 sec late? I tried to ignore processing of every second frame - it helped and the video did not lag, but rendering at 15 fps in bright looks pretty bad. Please help.
    Friday, March 27, 2015 5:04 PM