Ask a questionAsk a question
 

AnswerBarrier question

  • Monday, September 28, 2009 8:01 PMykwok Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I got the following output for a run of the example below.  The way I used the Barrier must be wrong because I expect the cpu (# of threads created) to be the same, e.g. cpu = 5 for every line.  What is the correct way to use it?  Thank you.

    cpu = 4, i = 1
    cpu = 4, i = 2
    cpu = 4, i = 3
    cpu = 4, i = 0
    cpu = 5, i = 0
    cpu = 5, i = 4
    cpu = 5, i = 2
    cpu = 5, i = 1
    cpu = 5, i = 3
    cpu = 5, i = 3
    cpu = 5, i = 0
    cpu = 5, i = 4
    cpu = 5, i = 1
    cpu = 5, i = 2
    cpu = 5, i = 0
    cpu = 5, i = 0

    ============================ code example ====================

    public static void Main(string[] args)
            {
                int cpu = 0;
                double[,] mutex = new double[1, 1];
                Barrier barrier = new Barrier(0);
                System.Threading.Tasks.Parallel.For(0, 16, () =>
                {
                    barrier.AddParticipant();
                    lock (mutex)
                    {
                        int id = cpu;
                        cpu++;
                        return id;
                    }
                   
                }, (i, state, local) =>
                {
                    barrier.SignalAndWait();
                    Console.WriteLine("cpu = " + cpu + ", i = " + local);
                    for (int j = 0; j < 20000000; j++) { }
                    return local;
                },
                local =>
                {
                    barrier.RemoveParticipant();
                });
            }

Answers

  • Tuesday, September 29, 2009 2:49 PMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code
    First, why you want to control partitioning in that manner, i.e. why is it not enough to let the Parallel.For decide on how partitioning should occur?  Unless you know that the workload is perfectly balanced and that there's nothing else happening on the machine that could perturb that balance, static partitioning can easily lead to load imbalance, harming the overall performance of the loop.

    That question aside, there are several ways to accomplish this.  See http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx for a few different implementations of a ForRange construct, e.g.

    public static ParallelLoopResult ForRange( 
        int fromInclusive, int toExclusive, Action<int, int> body) 
    { 
        int numberOfRanges = Environment.ProcessorCount; 
        int range = toExclusive - fromInclusive; 
        int stride = range / numberOfRanges; 
        if (range <= 0) numberOfRanges = 0; 
        return Parallel.For(0, numberOfRanges, i => { 
            int start = i * stride; 
            int end = (i == numberOfRanges - 1) ? toExclusive : start + stride; 
            body(start, end); 
        }); 
    }
    
    You can then write your code:

    ForRange(0, 100, (from,to) =>
    {
        for(int i=from; i<to; i++) { ... }
    });
    
    (Instead of using a Parallel.For in the implementation of ForRange, you could of course also spin up one Task per range, and then wait on them all with Task.WaitAll.  If you did want to use additional synchronization between partitions, e.g. a barrier, that would be the preferred approach.)

    If you want to create more complicated partitioning logic, you can encapsulate it in the implementation of a type derived from Partitioner<T> or OrderablePartitioner<T>, and then use an instance of that partitioner in a call to Parallel.ForEach (or PLINQ).  Several partitioners are built into .NET 4, and are exposed through the Partitioner.Create methods.  In fact, starting with Beta 2, there will be an overload of Create that does this kind of range partitioning.

All Replies

  • Tuesday, September 29, 2009 1:14 AMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    The number of threads involved in a Parallel.For's execution may change over time.

    What are you trying to accomplish?
  • Tuesday, September 29, 2009 12:59 PMykwok Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I want to find out the total number of threads created to divide the tasks among them my myself based on the threadID and number of threads.  This may be quite common in OpenMP and MPI.

    e.g.  for (i = 0; i < 100; i++) {..}

    If I know there are 4 threads created for the for-loop, I want the 1st thread to work on i = 0, ..., 24.  Of course, my example is overly simple.

    Thanks.
  • Tuesday, September 29, 2009 2:49 PMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code
    First, why you want to control partitioning in that manner, i.e. why is it not enough to let the Parallel.For decide on how partitioning should occur?  Unless you know that the workload is perfectly balanced and that there's nothing else happening on the machine that could perturb that balance, static partitioning can easily lead to load imbalance, harming the overall performance of the loop.

    That question aside, there are several ways to accomplish this.  See http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx for a few different implementations of a ForRange construct, e.g.

    public static ParallelLoopResult ForRange( 
        int fromInclusive, int toExclusive, Action<int, int> body) 
    { 
        int numberOfRanges = Environment.ProcessorCount; 
        int range = toExclusive - fromInclusive; 
        int stride = range / numberOfRanges; 
        if (range <= 0) numberOfRanges = 0; 
        return Parallel.For(0, numberOfRanges, i => { 
            int start = i * stride; 
            int end = (i == numberOfRanges - 1) ? toExclusive : start + stride; 
            body(start, end); 
        }); 
    }
    
    You can then write your code:

    ForRange(0, 100, (from,to) =>
    {
        for(int i=from; i<to; i++) { ... }
    });
    
    (Instead of using a Parallel.For in the implementation of ForRange, you could of course also spin up one Task per range, and then wait on them all with Task.WaitAll.  If you did want to use additional synchronization between partitions, e.g. a barrier, that would be the preferred approach.)

    If you want to create more complicated partitioning logic, you can encapsulate it in the implementation of a type derived from Partitioner<T> or OrderablePartitioner<T>, and then use an instance of that partitioner in a call to Parallel.ForEach (or PLINQ).  Several partitioners are built into .NET 4, and are exposed through the Partitioner.Create methods.  In fact, starting with Beta 2, there will be an overload of Create that does this kind of range partitioning.