System.Collections.Concurrent.Partitioner class


  • Hello,

    I've tried to browse MSDN for relative documentation, but haven't found much. Therefore, my question is:

    What is the purpose and usage scenarios of System.Collections.Concurrent.Partitioner class?
    Please, describe it in more details, then it is described in MSDN.
    Vitaliy Liptchinsky http://dotnetframeworkplanet.blogspot.com/
    Friday, May 29, 2009 8:54 AM


  • Hi Vitaliy-

    Consider a parallel algorithm like those used in PLINQ or Parallel.ForEach.  These algorithms need to take in some data and figure out how to distribute that data across multiple cores so that data elements may be processed in parallel; in other words, the data needs to be partitioned.  In the case of input data that comes in the form of a T[], or an IList<T>, or an IEnumerable<T>, these libraries have default, general implementations of partitioning that work well in many cases.  But a general partitioning implementation won't be the best possible for every possible problem, and thus it's useful to be able to tell these algorithms exactly how they should partition a given data source.  Consider, for example, a balanced Tree<T> that implements IEnumerable<T>.  As an instance of this tree supports enumeration, that instance could be handed off to PLINQ or Parallel.ForEach, which will happily enumerate over the tree data source.  Instead, however, better efficiency might be achievable if the Tree<T> told PLINQ or Parallel.ForEach to hand off large sections of the tree to the different threads, rather than forcing all of the threads to contend on the single IEnumerable<T> which is pipelining the nodes from the tree to those threads.

    This is where Partitioner<T> comes in.  PLINQ and Parallel.ForEach both support consuming implementations of Partitioner<T>, just like they support consuming implementations of IEnumerable<T>.  You can implement a Partitioner<T> that partitions your input data source and hands off those partitions to the consuming algorithm.  Back to the example, you could implement a TreePartitioner<T> that nows how to partition a Tree<T> efficiently for parallel consumption.

    Now, a Partitioner<T> isn't limited to partitioning odd data sources.  It can also be used to change the manner in which, for example, an IEnumerable<T> is partitioned.  If the default partitioning scheme used isn't ideal for your data, you could substitute in a different partitioning scheme.  The Partitioner.Create methods provide a few of these alternatives; for example, by default Parallel.ForEach will load balance data between partitions when provided with an array, but if you'd prefer to go with strict static partitioning, you could use the overload of Partitioner.Create that accepts an array and a boolean to disable load balancing.

    The Beta 1 samples for Parallel Extensions, available at http://code.msdn.microsoft.com/ParExtSamples, includes a sample Partitioner<T> for IEnumerable<T> called ChunkPartitioner<T>.  It allows you to control the chunk sizes used by PLINQ and Parallel.ForEach as they pull data from an IEnumerable<T>.

    Finally, note that both PLINQ and Parallel.ForEach need to be able to partition standard data sources like IEnumerable<T>.  Other parallel libraries will have similar needs.  Rather than forcing those libraries to write their own partitioner, they can simple consume the output of Partitioner.Create(IEnumerable<T>), relying on that general implementation to serve their partitioning needs.

    I hope that helps clarify.
    Friday, May 29, 2009 10:07 PM