none
Generic Queue Best Practice RRS feed

  • Question

  • Hi,

    We have the next use case where we need to manage streams base on the stream id (Fix 8000 streams)

    • One approach suggest to have a dedicated queue per stream (i.e. 8000 queues) managed by dictionary with the stream id as a unique key
    • Second approch suggest to have number of queues equals to number of cores (i.e. stream id is managed as a part of the queues object)

    In both cases the module will have working threads which will pop objects from the queues (number of threads equals to number of cores).

    What is the overhead (if at all) of managing 8000 qeueus for both CPU and memory wise?

    Thanks

    Yaron Cohen

    • Moved by Mike Dos ZhangModerator Friday, May 25, 2012 3:47 AM Base Class Library programming discussion (From:Visual C# General)
    • Changed type Mike FengModerator Friday, May 25, 2012 3:57 AM Since this CSS isn't work, we cannot reply this thread, please start a new thread at BCL forum. Thank you.
    • Edited by Brent Serbus Friday, May 25, 2012 5:58 AM html
    • Changed type Mike DanesModerator Friday, May 25, 2012 6:20 PM
    Thursday, May 24, 2012 11:52 AM

Answers

  • Please tell us what problem you are trying to solve.  i.e.: What do you want your program to do?

    Are you processing streams of video?  Probably not, because 8000 streams of video is probably excessive for any system.  In that kind of situation you expect one sample per stream per unit of time.  In other situations, like network connections, you might get arbitrary numbers of bytes per unit of time per stream, so there is no fixed rate for the stream.  Also, if there is a variable amount of processing that happens to process each item such that it makes it non-realtime, then you might need multiple buffers, but not one per stream.

    If the rates are not fixed, or the number of objects per unit time is not fixed, then you might have to buffer up information.  To do this, you will need a buffer per stream.  For example, when receiving bytes on 8000 network connections you will need to keep straight which byte belongs to which client.  That's 8000 buffers (call them queue's if you want ... call them arrays, whatever ... it doesn't matter what algorithm you choose to organize your data.)

    But if you were just processing frames of video then you could use a single queue.  Here you would expect to be required to process all the frames in real-time before passing them along downstream.  So basically you have to do all the work, and it's safe to process everything in the order it arrives so a single FIFO is fine regardless of the number of streams.

    But if there's a variable amount of work to be done (like processing customers at a fast-food counter) then you will want to use something like the TPL so that you can queue up work, (like customers lining up to place their orders) and then when the workers become available they work on getting the next customer's order filled, and it doesn't matter which order the work completes in.  Here the only queue we have is the main "line-up" of customers, and the TPL takes care of the rest.

    ThreadPool.QueueUserWorkItem works like this and you don't need framework 4.0 to use it.  It's available in your 3.5 framework.

    The best answer to your question will come when you reveal what specific problem you're trying to solve.

    • Marked as answer by Yaron.Cohen Monday, May 28, 2012 1:34 PM
    Monday, May 28, 2012 1:18 PM
  • Probably I'd go with the second approach.

    The only advantage I can see with the first approach is that you don't need to store the stream id in the queued object. I doubt that this id is so large that it can have a significant impact in terms of memory usage. But I see at least 2 disadvantages with the first approach:

    • 8000 queues will certainly use more memory than 2 - 32 queues
    • as I said above it seems difficult for a thread to get work to do from those queues, even with 32 cores you still have 250 queues per thread. How to manage them? I guess you'll end up with each thread having another queue of those 250 queues.

    You say that the first approach simplifies data management, maybe that's the case for this particular problem of yours but in general this seems a bit unnatural. I would expect that whatever object is placed in a queue it has enough information for a thread to be able to complete its work without having to infer additional information like "I got this item from queue X so this means I need to work with stream X".

    One interesting question remaining for the second approach is how exactly do you select the queue to use for a stream? With the first approach this is trivial but with the second approach you'd need so sort of way to balance the work between threads.

    • Marked as answer by Yaron.Cohen Monday, May 28, 2012 3:02 PM
    Monday, May 28, 2012 1:37 PM
    Moderator

All replies

  • Well, hopefully it goes without saying that you shouldn't create 8000 threads.  But 8000 queues is fine.  You can use the Task Parallel library to get things done efficiently.

    The overhead of 8000 threads is 8000 stacks.  Even if the stacks were tiny (32k each) that'd still be 250 MB of address space just for stacks.  Certainly a waste.  Also the context switching between 8000 threads is non-trivial.  A context switch might be performed in on the order of 100 nanoseconds, with 8 cores that's almost a millisecond of time wasted per context switch required by your code.

    Just queue up work items and let the TPL's thread pool manage how many threads there actually are.  Don't even try to correlate it to the number of cores.  Smart people have already figured out what the right number of threads is.  Use their answer -- use the TPL.

    Friday, May 25, 2012 5:17 PM
  • Thanks you very much for your quick response.

    We never thought about using 8000 threads.

    My question was about using or not using 8000 queues. From your question I ubderstand that you are fine with it. Can you please explain why working with 8000 is not considered more wasteful and complicated solution as opposed to working with few queues? Workning with TPL is a good solution, but currently we are limited to .Net framework 3.5

    Thanks
    Yaron

    Sunday, May 27, 2012 5:06 PM
  • Hi Wvck,

    Do you have any update?

    Have a nice day.


    Ghost,
    Call me ghost for short, Thanks
    To get the better answer, it should be a better question.

    Monday, May 28, 2012 3:34 AM
  • It's not clear what these queues are for. If the "algorithm" requires a queue per stream than that's it, you have to use 8000 queues. Otherwise you may want to consider using a queue per thread instead of per stream. It's not clear to me how do you plan to "pop objects from queues". Will a thread simply scan all queues trying to find work to do? Even if you group queues by thread you can still have hundreds of queues per thread, that's a lot.
    Monday, May 28, 2012 8:20 AM
    Moderator
  • This is the exact point here. We have not decided about the algorithm yet. Our two approaches:

    • One approach suggest to have a dedicated queue per stream (i.e. 8000 queues) managed by dictionary with the stream id as a unique key
    • Second approch suggest to have number of queues equals to number of cores (i.e. stream id is managed as a part of the queues object)

    We are trying to understand which solution is the most optimized and why? The first approach simplify the data management since that the object are already classified to queues based on their stream id. As you already said, the down side of this approach is that it forces each thread to mamaged with (8000/#  of threads) queues.

    Thanks
    Yaron Cohen

    Monday, May 28, 2012 12:04 PM
  • Please tell us what problem you are trying to solve.  i.e.: What do you want your program to do?

    Are you processing streams of video?  Probably not, because 8000 streams of video is probably excessive for any system.  In that kind of situation you expect one sample per stream per unit of time.  In other situations, like network connections, you might get arbitrary numbers of bytes per unit of time per stream, so there is no fixed rate for the stream.  Also, if there is a variable amount of processing that happens to process each item such that it makes it non-realtime, then you might need multiple buffers, but not one per stream.

    If the rates are not fixed, or the number of objects per unit time is not fixed, then you might have to buffer up information.  To do this, you will need a buffer per stream.  For example, when receiving bytes on 8000 network connections you will need to keep straight which byte belongs to which client.  That's 8000 buffers (call them queue's if you want ... call them arrays, whatever ... it doesn't matter what algorithm you choose to organize your data.)

    But if you were just processing frames of video then you could use a single queue.  Here you would expect to be required to process all the frames in real-time before passing them along downstream.  So basically you have to do all the work, and it's safe to process everything in the order it arrives so a single FIFO is fine regardless of the number of streams.

    But if there's a variable amount of work to be done (like processing customers at a fast-food counter) then you will want to use something like the TPL so that you can queue up work, (like customers lining up to place their orders) and then when the workers become available they work on getting the next customer's order filled, and it doesn't matter which order the work completes in.  Here the only queue we have is the main "line-up" of customers, and the TPL takes care of the rest.

    ThreadPool.QueueUserWorkItem works like this and you don't need framework 4.0 to use it.  It's available in your 3.5 framework.

    The best answer to your question will come when you reveal what specific problem you're trying to solve.

    • Marked as answer by Yaron.Cohen Monday, May 28, 2012 1:34 PM
    Monday, May 28, 2012 1:18 PM
  • Probably I'd go with the second approach.

    The only advantage I can see with the first approach is that you don't need to store the stream id in the queued object. I doubt that this id is so large that it can have a significant impact in terms of memory usage. But I see at least 2 disadvantages with the first approach:

    • 8000 queues will certainly use more memory than 2 - 32 queues
    • as I said above it seems difficult for a thread to get work to do from those queues, even with 32 cores you still have 250 queues per thread. How to manage them? I guess you'll end up with each thread having another queue of those 250 queues.

    You say that the first approach simplifies data management, maybe that's the case for this particular problem of yours but in general this seems a bit unnatural. I would expect that whatever object is placed in a queue it has enough information for a thread to be able to complete its work without having to infer additional information like "I got this item from queue X so this means I need to work with stream X".

    One interesting question remaining for the second approach is how exactly do you select the queue to use for a stream? With the first approach this is trivial but with the second approach you'd need so sort of way to balance the work between threads.

    • Marked as answer by Yaron.Cohen Monday, May 28, 2012 3:02 PM
    Monday, May 28, 2012 1:37 PM
    Moderator
  • Thanks you all for your input and time.

    I will continue this thread in internal discussion.

    Thanks
    Yaron Cohen

    Monday, May 28, 2012 3:06 PM