Is this a suitable job for TPL Dataflow?

Answered Is this a suitable job for TPL Dataflow?

  • Sunday, June 17, 2012 2:32 PM
     
     

    I run a pretty typical producer/consumer model on different tasks.

    Task1: Reads batches of byte[] from binary files and kicks off a new task for each collection of byte arrays. (the operation is batched for memory management purposes).

    Task 2-n: Those are worker tasks and each operates on the passed-in collection (from Tasks1) of byte arrays and de-serializes byte arrays, sorts them by certain criteria, and then stores a collection of resulting objects (each byte array deserializes into such object) in a Concurrent Dictionary.

    Task (n+1) I chose a concurrent dictionary because the job of this task is to merge those collections that are stored in the concurrent dictionary in the same order than how they originated from Task1. I achieve that by passing a collectionID (it is of type int and incremented for each new collection within Task1) all the way down from Task1 to this task. This task basically checks whether the next expected collectionID is already stored in the concurrent dictionary and if yes, takes it out, adds it to a Final Queue and checks for the next collection in the concurrent dictionary.

    Now, from what I have read and the videos I watched it seems to me that TPL Dataflow may be the perfect candidate for such producer/consumer model. I just do not seem to be able to devise a design and thus get started because I have never worked with TPL Dataflow. In terms of throughput and latency is this library even up to the task? I currently process 2.5 million byte arrays and thus objects per second in the resulting collections. Can TPL Dataflow help to simplify? I am especially interested in the answer to the following question: Can TPL Dataflow preserve the order of collection batches from Task1 when spawning off worker tasks and re-merging them once the worker tasks have done their work? Does it optimize things? Having profiled the whole structure I feel there is quite some time wasted due to spinning and too many concurrent collections involved.

    Any ideas and/or thoughts?

    Thanks a lot


    • Edited by Freddy173 Sunday, June 17, 2012 2:34 PM
    •  

All Replies

  • Sunday, June 17, 2012 5:58 PM
     
     Answered
    You asked exactly the same question few days ago on Stack Overflow. Is there something you didn't like about the solution I gave you there? Why exactly are you asking the same question again, this time here?
  • Monday, June 18, 2012 12:46 AM
     
     
    I appreciate your proposed solution and I am still working through different scenarios, partially implementing your suggestions. I mentioned I would get back to you at SO which I will. I just think it is never bad to get different opinions, varying suggestions. You were the only one who commented on my question and again, I appreciate it and will show my appreciation in due time. Again, I just like to garner a little more feedback, nothing wrong imho.