Push Source: Still Image to AviMux
-
Wednesday, May 16, 2012 11:30 AM
Hello
Here's what I am trying to accomplish:
I made a Live Source Filter which pushes out a bitmap (which is stored in the diskdrive).
Besides that, I developed a Transform filter, which takes in a first input pin, and then dynamically adds more filters (overlays).The transform filter will output the incoming first pin, and adds the other input pins as overlays in certain regions. In other words: the first pin is the background, onto which the other filters are drawn. The first pin will be the "Live Source Filter" I created, hence the background of the transform filter must be a static image.
If I render this transform filter to a Video Renderer, all is fine. However, when I try to output it to a file, things go haywire.
EDIT: in the output file: the video is much faster than the audio, the audio is just fine
The problem must located in the source filter, because whenever I add another source filter (webcam, ...) as the first pin of the overlay filter, I can definitely write to a file. So, the problem only exists when using the source filter.
I think the timestamps of the samples is the problem, but how can I make sure the Live Source Filter outputs at a constat framerate of 30FPS?
Should I implement some synchronization mechanism (IAMPushSource, use my own reference clock, ...?)Anyways, thanks for helping me out.
Steven
- Edited by StevenDeRoover Wednesday, May 16, 2012 11:32 AM
All Replies
-
Friday, May 18, 2012 3:50 AMStevenDeRoover wrote:>>Here's what I am trying to accomplish:>>I made a Live Source Filter which pushes out a bitmap (which is stored in the diskdrive).Are you setting the timestamps in the IMediaSamples that you are sendingalong?>EDIT: in the output file: the video is much faster than the audio, the>audio is just fineThat will happen if you send out non-timestamped frames.--Tim Roberts, timr@probo.comProvidenza & Boekelheide, Inc.
Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc. -
Friday, May 18, 2012 7:47 AM
Hi Tim
Thanks for your reply. For the StillImageSource, I took the sample of the SDK, the PushSourceBitmap (beside the PushSourceBitmapset and the PushSourceBitmapDesktop).
As far as I understand it, this filter will create a thread, and create a endless loop, in which this loop calls the FillBuffer method. This would mean that there is no real "frames per second" but as many frames as the processor can handle in the loop, right?
The timestamps are set on the samples, the same way the PushSourceBitmap example does: the framecount is remembered and added with 1 on every FillBuffer call. Then, the starttime of the sample is "framecount * FPS_30" where FPS_30 = UNITS / 30, which should equal to 30 frames per second. The end time of the sample will then be starttime + FPS_30.
I also tried to implement the IAMPushSource on the output pin, and IAMFilterMiscFlags on the filter itself, to support rate matching. I would then add the offset to the timestamps. However, I would not even get any image at all in the output file.
One thing I also tried, is set the AvgTimePerFrame on GetMediaType. I would then multiply this with 2.4, 2.5, ... and play with that. The output file would be approximately right, but I already realised that that would not be the answer, because on another PC, this value would be totally different. And indeed, I tested it on two PC's, and the output was totally different. So this is not the solution to the problem.
I have another DShow Filter, a trial one (and expensive) from another SDK. This filter does exactly the same, and works, so I know it's possible. But, I would rather develop it on my one, because my company would like to have all cards (read: source code) in own hands.
I'm actually quite desperate at the moment. All I wanted was to have a filter to output a simple bitmap, on a framerate of 30FPS...
-
Friday, May 18, 2012 8:35 AM
The frame rate is enforced on video renderer that terminates the pipeline - it is going to wait if the incoming samples are reaching it at a faster rater. This would eventually block memory allocators and from your filter side you would be unable to deliver new frames, since you have no free buffer to fill, and FillBuffer is not get called.
So your responsibility is to FillBuffer as fast as you can, and put time stamps at 30 fps granularity. The rest is supposed to be done outside of your filter.
-
Friday, May 18, 2012 11:07 AM
It's true that a video renderer displays everything fine, but when I output everything to an AviMux and then to a FileWriter, things go bad.
Suppose I only add the StillImageSource, attach this to an AviMux and then to a FileWriter. If I would then start the graph, and stop the graph after 10 seconds, then the outputted file would only be 3 seconds long. That doesn't seem right.
-
Sunday, May 20, 2012 5:30 AMStevenDeRoover wrote:>>The timestamps are set on the samples, the same way the PushSourceBitmap>example does: the framecount is remembered and added with 1 on every>FillBuffer call. Then, the starttime of the sample is "framecount *>FPS_30" where FPS_30 = UNITS / 30, which should equal to 30 frames>per second. The end time of the sample will then be starttime + FPS_30.Are you using the "Smart Tee"? The Preview pin of the "Smart Tee" stripsout timestamps.--Tim Roberts, timr@probo.comProvidenza & Boekelheide, Inc.
Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc. -
Sunday, May 20, 2012 7:09 AM
No, just the plain-old Avi Mux filter, combined with the File Writer.
If I understand it correctly, my pushsource will not be providing the FPS I want, but the renderer does. But then, what about my ScreenCapture filter I have, it sends out samples on a constant framerate of 30 FPS. It's possible then, isn't it?
Maybe, I shouln't be using the CSource and CSourceStream base classes, and write my own implementation?
PS: I'm sorry if I ask stupid questions, but I've only just begon to write DirectShow filters for 2 weeks now, so I'm quite a newbie :)
-
Sunday, May 20, 2012 10:32 AMIIRC, Avi Mux checks AvgFrameTime value in media type and won't accept zero there. Because AVI itself is fixed FPS container. Then you have to provide correct time stamps or again mux will fail to write the data. I would say that most likely this is the cause of your problems, because other than that it operates in a regular way.
http://alax.info/blog/tag/directshow
- Edited by Roman RyltsovMVP Sunday, May 20, 2012 10:33 AM
-
Sunday, May 20, 2012 7:12 PM
Hi Roman
This is already set to the framelength: UNITS / 30 (which would equal to 30 FPS)
-
Tuesday, May 22, 2012 7:02 AM
Owkey
This is what I got right now: in GetMediaType, I set the AvgFrameTime to FPS_30. Then, in the FillBuffer method, I keep a private reference time "m_lastTime", which I refill on every call.
In every FillBuffer, I do this:
REFERENCE_TIME avgFrameTime = ((VIDEOINFOHEADER*)m_mt.pbFormat)->AvgTimePerFrame; pVih->AvgTimePerFrame = avgFrameTime; REFERENCE_TIME rtStart = m_rtLastTime; REFERENCE_TIME rtStop = rtStart + (FPS_30); m_rtLastTime += avgFrameTime; pSample->SetTime(&rtStart, &rtStop);
At first, this didn't work, but I found that my VideoMixerFilter (the transform filter after the StillImageSource), was slowing everything down. So, I commented out some things in this filter, and then the FPS was correct in my output file.
The focus now is on the transform filter. This filter, has a input pin, and an output pin. If I connect a filter to the input pin, I add another "AdditionalPin", which is an overridden class.
The only thing these "AdditionalPins" do, is on receiving a sample, keeping a Bitmap of the received sample.
On Transform of the filter (which is called on every receive of the first pin), I put the bitmaps of the AdditionalPins onto the sample of the first pin.
In other words: the first pin is my StillImageSource (a background), other input pins are "AdditionalPins", and are put onto the background in a certain region. I use Gdi+ for this.
So, question is: is my concept of the additional pins faulty, or is using Gdi+ in my Transform method too slow?
PS: I know there's some Queue-ing class in DShow. Maybe using that, I would not have any delay on my source filter, but will it solve my problem?
-
Wednesday, May 23, 2012 9:04 AMAppearantly, GDI+ is rather slow in drawing images?
-
Wednesday, May 23, 2012 9:13 AMComputers are fast nowadays, so it might be fast enough. However, yes it is slow.
-
Wednesday, May 23, 2012 9:45 AMQuestion is, can I bypass or replace GDI+ with smth else. Can I use Direct2D f.i. to achieve the same result. Will it be faster, ...?
- Edited by StevenDeRoover Wednesday, May 23, 2012 9:45 AM
-
Wednesday, May 23, 2012 9:49 AM
For streaming you typically manipulate image data:
- as byte buffer
- YUV pixel formats
- possibly non-standard strides if requested by renderers and video hardware
GDI and GDI+ are not good fits here. Specialized image processing libraries, such as Intel Media SDK, might be good: their data representation is close or compatible, or their objects, classes and helpers target DirectShow directly. Unless you are using specific third party solution, I would say that you will have to deal with byte copying of data in order to prepare for streaming through DirectShow pipeline.
-
Wednesday, May 23, 2012 9:58 AM
I'm afraid I'm not completely following here. As I understand it correctly, in my Transform filter I shouldn't use Gdi+ to put samples of the additional input pins onto the sample of the first (background) input pin?
-
Wednesday, May 23, 2012 10:13 AMIn your filter you receive buffers and you deliver buffers. You are free to choose the tools to modify data: GDI+ can work out, maybe it is even a good solution, for faster options you are likely to need a third party libary or API, unless you are going to do byte/bit manipulation yourself.
-
Wednesday, May 23, 2012 10:51 AM
What kind of transformation are you going to do on your incoming buffers? If you are going to do fairly simple things, like scale the incoming rectangular buffer do a different rectangle (which you might do for a picture-in-picture application), you will probably find that GDI is fast enough. I've experimented (albeit very superficially) with using Direct3D calls inside a filter. It's possible, but it's a bit of work.
I would suggest you isolate your image-processing to a separate routine called by your filter (pretty much standard practice anyway, I think), and start with GDI. If that's fast enough, great! If not, you can improve it later. Remember Jackson's advice on optimization.
Stevens Miller http://www.withoutsupervision.com
-
Wednesday, May 23, 2012 12:00 PM
It is indeed a Picture-In-Picture concept that i'm trying to achieve. My aim is to have multiple sources onto a background (a screen capture + IP Camera sources + ...). Additionally, draw text (f.i. the name of the person on the webcam below the webcam image) and external images loaded from a file (f.i. a logo).
On my development machine, I can reach 25FPS, if I only have my bakcground (the first pin), and one overlay (f.i. webcam source). But, on a "less powerfull pc", 10FPS is the maximum.
Since my development machine is quite powerfull, I thought reaching 30 would be piece of cake, but clearly isn't.
-
Wednesday, May 23, 2012 1:21 PMI think you should be rather looking into VMR/EVR mixing and custom allocator/presenters for rendering multiple videos simulnateously, into Direct3D device. There is no way for GDI or GDI+ to be fast enough if you have multiple heavy streams. SDK has a few samples showing VMR modes of operation, earlier SDK had even more to check out.
-
Wednesday, May 23, 2012 2:26 PMIsn't VMR/EVR only for displaying? I should be able to write the output to a file, or maybe later streaming it to the network...
-
Wednesday, May 23, 2012 3:08 PM
As it is, the VMR9 is only for display; it has no output pin. What Roman is suggesting is a way around that. At run-time, you can load your own code to replace some parts of the VMR9 (this may apply to the EVR too; I have never used it). In theory, you can replace the built-in allocator/presenter with one of your own, which would have total control over where the final image ends up (such as a file, or perhaps even sent downstream to another filter). A good resource for reading up on it is this chapter of "Fundamentals of Audio and Video Programming for Games."
Now, if that's too much to start with and you want to continue with your filter approach, a lot will depend on how you down-sample your images. There's quite a bit online about that. If you want something broadcast-quality, then, yes, that's going to tax your CPU. It certainly is possible to make Direct3D calls in a filter, but it's more complicated than simple GDI-style stuff. Before getting into that, you might look at your down-sampling code and see if there are any improvements to be made there. How are you filtering your image before copying it onto your background? (Sad fact of DirectShow life is that the word "filter" means something totally different in the context of a graph than it does in the context of image-processing; in DirectShow, "filter" is mostly a noun, while in image-processing, it is mostly a verb.)
Stevens Miller http://www.withoutsupervision.com
-
Wednesday, May 23, 2012 7:36 PMWell yes, this is presentation only. This is not going to work out if you are to write to file. Both methods make sense for different situations. Hardware can mix video feeds of different formats and combine them together in hardware (fast) as textures. Reading back from video memory is typically painfully slow, so you don't want to read back what hardware combined for you. So, in order to write to file you will have to do all mixing yourself in transformation filter. And having that done, you can defer D3D implementation because even if it would offer some advantages, you already have one working code path (that is, through mixing in transform filter).
-
Wednesday, May 23, 2012 7:39 PMThe main problem with GDI is that you have to feed it with RGB images. And this is almost never the thing you want to do: all video is YUV and more compact (12-16 bpp) so you need to expand your buffers and toss more data in RAM and into video adapter. This is a performance killer.
-
Wednesday, May 23, 2012 9:49 PM
The main problem with GDI is that you have to feed it with RGB images. And this is almost never the thing you want to do: all video is YUV and more compact (12-16 bpp) so you need to expand your buffers and toss more data in RAM and into video adapter. This is a performance killer.
That's a little harsh on RGB. It's easy to work with and, at some point, you do have to convert to it, don't you?
As for video, my Microsoft LifeCam HD-5001 puts out RGB and YUY2. My Logitech C600 puts out RGB and I420, as does my DC-6120. YUY2 is a packed format that's easy to convert back and forth from and to RGB. Every 32 packed YUY2 bits decodes into 48 RGB bits (that is, every complete YUY2 sample yields two RGB pixels). I420 is a bit more complicated as it separates its components into arrays instead of packed "macropixels," though processing the arrays individually might actually be easier than processing unpacked values. I haven't worked much with non-RGB yet, so I'm not a good judge of that.
Now, I do think YUV is going to be a fact of life, as the EVR won't take anything else on its input pins (other than Pin 0). But, while developing something, RGB (with the VMR9) is not a bad place to start, I think. Gory details can be found online.
Stevens Miller http://www.withoutsupervision.com
- Edited by Stevens Miller Wednesday, May 23, 2012 9:59 PM
-
Thursday, May 24, 2012 6:00 AM
That's a little harsh on RGB. It's easy to work with and, at some point, you do have to convert to it, don't you?
No, you can keep video in YUV all the way up to presentation, it is a good idea to avoid RGB completely (performance-wise).
Reading:
-
Thursday, May 24, 2012 8:23 AM
It seems that having a transparent image slows down the whole process (from 30FPS to 12FPS).
I do have another question: if I put an ffdshow raw video filter after my video mixer, I can set an OSD on top of the image. The "input bitrate" is "4294260 mbps" (approx). Isn't that too high? If so, how can I downsize it? And also: it goes down all the time; shouldn't it stay around the same amount?
-
Thursday, May 24, 2012 3:20 PM
No, you can keep video in YUV all the way up to presentation, it is a good idea to avoid RGB completely (performance-wise).
Up to presentation, certainly. But is there hardware that will take YUV as the actual display data? (Well, I suppose an actual television does that, in a way, but I'm asking about a graphics display adapter.)
(Hey, is anyone else having trouble getting to msdn.com today?)
Stevens Miller http://www.withoutsupervision.com
- Edited by Stevens Miller Thursday, May 24, 2012 3:21 PM
-
Thursday, May 24, 2012 3:36 PMFor over 10 years, hardware prefers taking YUV video under Windows (as compared to RGB).
-
Thursday, May 24, 2012 5:03 PM
For over 10 years, hardware prefers taking YUV video under Windows (as compared to RGB).
VRAM now stores YUV data? That's remarkable (and says something about how long it has been since I programmed at the hardware level).Stevens Miller http://www.withoutsupervision.com
-
Saturday, May 26, 2012 4:05 AMStevenDeRoover wrote:>>I do have another question: if I put an ffdshow raw video filter after my>video mixer, I can set an OSD on top of the image. The "input bitrate">is "4294260 mbps" (approx). Isn't that too high?Of course. It means you picked up a negative value at some point. Rememberthat 0xFFFFFFFF is 4294967296.>If so, how can I downsize it?You should have been able to figure out that it was a false reading. You'retalking about 500 gigabytes per second. I guarantee your computer isn'tthat hot.--Tim Roberts, timr@probo.comProvidenza & Boekelheide, Inc.
Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc. -
Saturday, May 26, 2012 4:06 AMStevens Miller wrote:>>>For over 10 years, hardware prefers taking YUV video under Windows (as compared to RGB).>>VRAM now stores YUV data?Frame buffers are typically still RGB, but videos are always displayed inan overlay or texture surface, and those certainly do prefer YUV.--Tim Roberts, timr@probo.comProvidenza & Boekelheide, Inc.
Tim Roberts, VC++ MVP Providenza & Boekelheide, Inc. -
Saturday, May 26, 2012 4:10 AM
That is very cool, actually. I did some studying on this after Roman's post. Certainly does seem to offer a performance improvement over doing everything in RGB.
Reading this forum is like going back to graduate school sometimes (but only the parts I liked).
Stevens Miller http://www.withoutsupervision.com
-
Saturday, May 26, 2012 4:52 AMHave a look at this [old] app which shows how many RGB frames you can push through as compared to YV12 and YUY2.
http://alax.info/blog/tag/directshow
- Edited by Roman RyltsovMVP Saturday, May 26, 2012 4:52 AM
-
Saturday, May 26, 2012 1:11 PM
Wow! Very tidy piece of research, Roman, with convincing results. Do you think those differences are primarily the result of the different amounts of data involved? The ratios of your fps results seem roughly comparable to the ratios of the sizes of the pixel data involved.
I've done some reading on the luma-chroma formats, starting with the ITU BT.601 recommendation, and going on from there. the YUV-RGB transformation is purely linear. Does that mean linear operations in the YUV space equate to linear operations in the RGB space? I'm doing a lot with weighted filtration right now (blurring, mostly). If I filter in YUV space, do I get the same final image as if I had filtered in RGB space? Seems like I should (and I suppose I could just test that myself, but thought I'd ask while I have your attention 8-) ).
Thanks very much for helping me improve my understanding of this.
Stevens Miller http://www.withoutsupervision.com
-
Saturday, May 26, 2012 1:44 PM
Yes the metrics are along the line of amount of data transferred to video adapter. Additionally, RGB formats can be hit by additional conversions on the way because the "universal" representation is 24-bit RGB, and adapters prefer 32-bit RGB for its alignment, and it is even heavier.
The conversion is linear, as for filtering however I am not sure - it certainly depends on the operations you apply.
-
Saturday, May 26, 2012 5:34 PM
Yes the metrics are along the line of amount of data transferred to video adapter. Additionally, RGB formats can be hit by additional conversions on the way because the "universal" representation is 24-bit RGB, and adapters prefer 32-bit RGB for its alignment, and it is even heavier.
Good point. What I am doing involves a lot of transparency, so I actually do use that fourth byte in the ARGB32 format. Is there a YUV equivalent to ARGB? [You already answered that, didn't you? It's above, in your recommended reading list. Looks like AYUV takes me all the way back to 32-bits-per-pixel, though.]
The conversion is linear, as for filtering however I am not sure - it certainly depends on the operations you apply
Well, let's see... been about 20 years since I've thought about this stuff, but let's consider the math...
For a point in RGB space, Prgb, we map it to Pyuv in YUV space with a matrix, T: [T][Prgb] = [Pyuv] which naturally means that we can reverse the mapping with T's inverse: [T-1][Pyuv] = [Prgb] Thus, if we apply a linear transform M to Prgb, the question is:
What transform applied to Pyuv yields the same result when the
transformed point is mapped back to RGB space? That is, we need X such that it satisfies this: [T-1][X][Pyuv] = [M][Prgb] A little algebra proceeds thus: [T-1][X][T][Prgb] = [M][Prgb] [T-1][X][T] = [M] [X][T] = [T][M] [X] = [T][M][T-1] Which, of course, answers my question: No, a transform
applied in RGB space does not yield the same color when
applied in YUV space, but you can convert any given
RGB-space transformation to an equivalent YUV-space
transformation in one step that is good for the whole space.
Елки палки, you can do it, but it adds a little math. As a practical matter, however, given what the dimensions in YUV space actually represent, it seems likely to me that a lot of image processing can be done in that space without having to be very concerned with equivalences in RGB space. For example if one wished merely to see a grayscale rendition of an image, one would only have to set U=0 and V=0 (well, U = V = 128, I guess), leaving Y alone. Heck, that's a lot less work than computing Y from RGB space, which is what you would have to do to get the same result on an RGB image (or, at least, some comparable operation that yielded some measure of "brightness" from each RGB pixel).
This is certainly opening a lot of doors in my thinking. I know this is light-years off the original topic, but do you know a good intro source for learning more about image processing in YUV space?
Stevens Miller http://www.withoutsupervision.com
- Edited by Stevens Miller Saturday, May 26, 2012 5:55 PM


