Parallel Computing Developer Center >
Parallel Computing Forums
>
Parallel Extensions to the .NET Framework
>
Barrier question
Barrier question
- I got the following output for a run of the example below. The way I used the Barrier must be wrong because I expect the cpu (# of threads created) to be the same, e.g. cpu = 5 for every line. What is the correct way to use it? Thank you.
cpu = 4, i = 1
cpu = 4, i = 2
cpu = 4, i = 3
cpu = 4, i = 0
cpu = 5, i = 0
cpu = 5, i = 4
cpu = 5, i = 2
cpu = 5, i = 1
cpu = 5, i = 3
cpu = 5, i = 3
cpu = 5, i = 0
cpu = 5, i = 4
cpu = 5, i = 1
cpu = 5, i = 2
cpu = 5, i = 0
cpu = 5, i = 0
============================ code example ====================
public static void Main(string[] args)
{
int cpu = 0;
double[,] mutex = new double[1, 1];
Barrier barrier = new Barrier(0);
System.Threading.Tasks.Parallel.For(0, 16, () =>
{
barrier.AddParticipant();
lock (mutex)
{
int id = cpu;
cpu++;
return id;
}
}, (i, state, local) =>
{
barrier.SignalAndWait();
Console.WriteLine("cpu = " + cpu + ", i = " + local);
for (int j = 0; j < 20000000; j++) { }
return local;
},
local =>
{
barrier.RemoveParticipant();
});
}
Answers
- First, why you want to control partitioning in that manner, i.e. why is it not enough to let the Parallel.For decide on how partitioning should occur? Unless you know that the workload is perfectly balanced and that there's nothing else happening on the machine that could perturb that balance, static partitioning can easily lead to load imbalance, harming the overall performance of the loop.
That question aside, there are several ways to accomplish this. See http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx for a few different implementations of a ForRange construct, e.g.
You can then write your code:public static ParallelLoopResult ForRange( int fromInclusive, int toExclusive, Action<int, int> body) { int numberOfRanges = Environment.ProcessorCount; int range = toExclusive - fromInclusive; int stride = range / numberOfRanges; if (range <= 0) numberOfRanges = 0; return Parallel.For(0, numberOfRanges, i => { int start = i * stride; int end = (i == numberOfRanges - 1) ? toExclusive : start + stride; body(start, end); }); }
(Instead of using a Parallel.For in the implementation of ForRange, you could of course also spin up one Task per range, and then wait on them all with Task.WaitAll. If you did want to use additional synchronization between partitions, e.g. a barrier, that would be the preferred approach.)ForRange(0, 100, (from,to) => { for(int i=from; i<to; i++) { ... } });
If you want to create more complicated partitioning logic, you can encapsulate it in the implementation of a type derived from Partitioner<T> or OrderablePartitioner<T>, and then use an instance of that partitioner in a call to Parallel.ForEach (or PLINQ). Several partitioners are built into .NET 4, and are exposed through the Partitioner.Create methods. In fact, starting with Beta 2, there will be an overload of Create that does this kind of range partitioning.- Proposed As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 2:49 PM
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorMonday, October 05, 2009 1:05 AM
All Replies
- The number of threads involved in a Parallel.For's execution may change over time.
What are you trying to accomplish? - I want to find out the total number of threads created to divide the tasks among them my myself based on the threadID and number of threads. This may be quite common in OpenMP and MPI.
e.g. for (i = 0; i < 100; i++) {..}
If I know there are 4 threads created for the for-loop, I want the 1st thread to work on i = 0, ..., 24. Of course, my example is overly simple.
Thanks. - First, why you want to control partitioning in that manner, i.e. why is it not enough to let the Parallel.For decide on how partitioning should occur? Unless you know that the workload is perfectly balanced and that there's nothing else happening on the machine that could perturb that balance, static partitioning can easily lead to load imbalance, harming the overall performance of the loop.
That question aside, there are several ways to accomplish this. See http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx for a few different implementations of a ForRange construct, e.g.
You can then write your code:public static ParallelLoopResult ForRange( int fromInclusive, int toExclusive, Action<int, int> body) { int numberOfRanges = Environment.ProcessorCount; int range = toExclusive - fromInclusive; int stride = range / numberOfRanges; if (range <= 0) numberOfRanges = 0; return Parallel.For(0, numberOfRanges, i => { int start = i * stride; int end = (i == numberOfRanges - 1) ? toExclusive : start + stride; body(start, end); }); }
(Instead of using a Parallel.For in the implementation of ForRange, you could of course also spin up one Task per range, and then wait on them all with Task.WaitAll. If you did want to use additional synchronization between partitions, e.g. a barrier, that would be the preferred approach.)ForRange(0, 100, (from,to) => { for(int i=from; i<to; i++) { ... } });
If you want to create more complicated partitioning logic, you can encapsulate it in the implementation of a type derived from Partitioner<T> or OrderablePartitioner<T>, and then use an instance of that partitioner in a call to Parallel.ForEach (or PLINQ). Several partitioners are built into .NET 4, and are exposed through the Partitioner.Create methods. In fact, starting with Beta 2, there will be an overload of Create that does this kind of range partitioning.- Proposed As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 2:49 PM
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorMonday, October 05, 2009 1:05 AM


