locked
Raw H.264 Data and MF RRS feed

  • Question

  • To Matt in particular, who has followed a few of my threads,

    The current model of camera I'm working with is pretty much perfect. My current stage of development is an application that reads a unicast stream of H.264 data from the camera. With the configuration I've set on the camera, the stream runs at 30 FPS and has a keyframe once every 30 frames(assuming this since every 30th frame is 100 packets long and the other frames inbetween are variably between 9-20 packets).

    The question is, will I need to provide a fully blown custom media source if I have the raw H.264 data readily accessible in byte array(s)?

    Secondly, I'm a bit confused about the media buffer aspect of custom media sources. Since I have each frame of the H.264 stored seperately, would they be considered uncompressed frames and require me to use this guide (http://msdn.microsoft.com/en-us/library/aa473821(VS.85).aspx)? My assumptions are no since the frames are H.264 encoded and I'd have to decode them myself instead of taking advantage of MF under Windows 7.

    Anything that would correct my understanding would be great thanks.
    Friday, September 18, 2009 6:48 PM

Answers

  • It sounds like you do need to write a media source to provide samples and advertise the right media type.  There is not a 'raw' source in MF that will process a raw bitstream.

    It is generally expected that for video streams frames correspond 1:1 with samples, even for compressed video.  You do not need to treat the frames as uncompressed video.  One thing that may be important is that the MF H264 decoder only supports Annex B H264 streams-- see http://msdn.microsoft.com/en-us/library/dd797815(VS.85).aspx on the H264 decoder.  My understanding is that the conversion from a regular NALU stream without start codes to an annex B stream is not particularly difficult, but it does require using the bitstream specifications as a reference.
    Monday, September 21, 2009 7:11 PM

All replies

  • It sounds like you do need to write a media source to provide samples and advertise the right media type.  There is not a 'raw' source in MF that will process a raw bitstream.

    It is generally expected that for video streams frames correspond 1:1 with samples, even for compressed video.  You do not need to treat the frames as uncompressed video.  One thing that may be important is that the MF H264 decoder only supports Annex B H264 streams-- see http://msdn.microsoft.com/en-us/library/dd797815(VS.85).aspx on the H264 decoder.  My understanding is that the conversion from a regular NALU stream without start codes to an annex B stream is not particularly difficult, but it does require using the bitstream specifications as a reference.
    Monday, September 21, 2009 7:11 PM
  • Thanks for the tips Matt, I'll look into the specifics of the H.264 stream.

    I'm struggling a bit though with making a custom media source. I'm finding the MPEG-1 demo project hard to follow because the MPEG logic and MF logic are hard to seperate because I have a shallow understanding of both. I find the scope of it too abstract as it has a very generalised purpose etc. The media source I'm trying to create is compiled with the project as well to simplify things and remove the need to COM stuff.

    I have a suggestion for the SDK documentation online. The section for Media Sources should have a flow diagram similar to the one presented at http://msdn.microsoft.com/en-us/library/aa371866(VS.85).aspx. Although I've managed to understand the purpose of each interface and their requirements, without a flow representation, I'm having trouble with piecing them together and applying the theory to the data I have.

    With regards to sampling, I was reading in the SDK documentation somewhere, I can't find it again at the moment, something to do with initial frame samples. If the pipeline tries to pull the first sample from the source(i.e. a Play command token is put in the queue) and the current frame data I have from the IP camera is somewhere between keyframes, does my media source have to refrain from dispatching a sample until it is a full frame(keyframe)?

    Thanks again Matt.

    --Edit, inspecting the initial request header from the camera, the hardware tells me it operates on a Baseline Profile at Level 1 which is supported by the Transform Decoder in the link you provided. So essentially, all I need to do is to plug that data into the transform. I believe the trouble I'm having at the moment is implementing a presentation descriptor and the subtleties of the media stream interface.
    Tuesday, September 22, 2009 6:36 PM
  • Thanks for the suggestion on the documentation, that's a good idea.

    There is some more information about the MPEG-1 sample here: http://msdn.microsoft.com/en-us/library/ee318417(VS.85).aspx

    Also the WavSource sample is designed to be a simpler example of a media source. In particular, all of the methods are synchronous. (And WAVE files are just easier to parse.)


    - Mike


    Mike Wasson (SDK Documentation)
    Tuesday, September 22, 2009 7:57 PM
  • Thanks for that Mike, the WAV source project was a lot easier to follow. From what I gathered, the source was designed to inter-operate with the source resolver via MIME extension and it made sense why the media source was constructed through the byte stream handler and not the other way round.

    The only fundamental question I have left about custom media sources is, if I create an instance of my media source without it being created via a source resolver, do I run the risk of certain interface functions not being called by the MF API? Or does everything, including the Media Stream Descriptor, get processed via the PlaybackTopology Creator?

    With the Media Stream Descriptor in particular, I noticed that the Media Type in the WAVSource project uses MFInitMediaTypeFromWaveFormatEx() to initialise. The video counterpart that I looked at, MFInitMediaTypeFromMFVideoFormat, requires a lot of information that I either have no idea about or information needed I can't get despite already signing an NDA with the manufacturer.

    With http://msdn.microsoft.com/en-us/library/aa473804(VS.85).aspx, I can fill in a fair few there from the header data I get from the IP camera. Video dimensions, frame-rate, H.264 profile and level etc but the rest of them I have no idea about. That page also says not to use that structure manually in an application for this reason. It also says that I should be able to get by using Major and Minor attributes (i.e. MFVideoFormat_H264) but I can't find references to actually creating the Media Types via existing major and minor types.

    Wednesday, September 23, 2009 10:54 PM
  • That is correct -- a video source should drop all frames before the first keyframe after a Start call.  The decoder will not be able to decode these frames without the previous keyframe anyway.

    Do not hesitate to ask questions if you are hitting difficulties.  If we know where people are having difficulties it will help us identify API holes or documentation holes.
    Wednesday, September 23, 2009 11:18 PM