none
Not sure if I'm doing it right

    Question

  • Hi

    My apologies for a total newbie question, but here goes anyway.  We have a library of tasks (not the parallel type, but our own) which I want to run in parallel as they don't have any interactions.  Each one has an execute method so I thought that doing something like the following would get me much better performance as each task would be run in parallel. 

     

    // Common setup

    List<MatrixTask> matrixTasks = new List<MatrixTask>();

    matrixTasks.Add(new MatrixTask());

    matrixTasks.Add(new MatrixTask());

    matrixTasks.Add(new MatrixTask());

     

    // Parallel approach

    Parallel.ForEach<MatrixTask>(matrixTasks, task => { task.Execute(); });

    TaskCoordinator.WaitAll(matrixTasks);

     

    // Traditional approach

    foreach (MatrixTask matrixTask in matrixTasks)

      matrixTask.Execute();

     

    I'm running on a Dell dual processor, quad core machine and the parallel version runs more slowly than the serial one.  What am I doing wrong?  Can the parallel API be used in this kind of scenario or am I just better off using traditional multi-threading techniques?

     

    Thanks in advance for any help anyone can offer.

     

    Best regards

     

    Marek

    Friday, May 30, 2008 10:15 AM

Answers

  • IIRC because of the current work item partionining scheme employed by the Parallel Extensions CTP, all three tasks will actually execute on the same thread.  (You can check this by looking at CPU usage, and by printing the thread ID from inside the Execute method.)

     

    A workaround would be parallelizing the outer loop inside the matrix multiplication code, or splitting it otherwise into multiple chunks.

     

    Saturday, May 31, 2008 10:16 AM

All replies

  • Hi,

     

    The most important thing is missing here. What is performed on Execute( ). Can you paste some pseudo code for that? Math / logic vs. memory utilization is most important.

     

    Also keep in mind that some parts of the current version of the library are not fully optimized yet.

     

    Regards,

    Asaf

     

     

     

    Friday, May 30, 2008 9:56 PM
  • Hi Asaf

    The execute method just includes the non-optimised matrix multiplication from the parallel api C# example.  I'm confused though.  I appreciate that the contents of the Execute method are important, but the fact that the Execute is being called (or is it?) for each object independently on different processors should result in a performance improvement, should it not?

     

    Best regards

     

    Marek

    Saturday, May 31, 2008 7:15 AM
  • Hi

    I felt guilty not giving you the code, so here it is:

     

    Code Snippet

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    using System.Threading;

    using System.Threading.Tasks;

    namespace TestParallelAPI

    {

    class Program

    {

    static void Main(string[] args)

    {

    Console.WriteLine("Processor count: {0}", System.Environment.ProcessorCount);

    List tasks = new List();

    tasks.Add(new MatrixMultiplicationTask());

    tasks.Add(new MatrixMultiplicationTask());

    //tasks.Add(new MatrixMultiplicationTask());

    double inSeriesTime = RunInSeries(tasks);

    double inParallelTime = RunInParallel(tasks);

    Console.WriteLine("Speedup: {0}\nPress any key to continue", inSeriesTime / inParallelTime);

    Console.ReadKey();

    }

    private static double RunInSeries(List tasks)

    {

    DateTime startTime = DateTime.Now;

    Console.WriteLine("Starting series run: " + startTime.ToLongTimeString());

    foreach (MatrixMultiplicationTask task in tasks)

    task.Execute();

    DateTime finishTime = DateTime.Now;

    double timeTaken = (finishTime - startTime).TotalSeconds;

    Console.WriteLine("Finished series run: {0}. Time taken: {1}", finishTime.ToLongTimeString(), timeTaken);

    return timeTaken;

    }

    private static double RunInParallel(List matrixTasks)

    {

    DateTime startTime = DateTime.Now;

    Console.WriteLine("Starting parallel run: " + startTime.ToLongTimeString());

    Parallel.ForEach(matrixTasks, matrixTask => { matrixTask.Execute(); });

    DateTime finishTime = DateTime.Now;

    double timeTaken = (finishTime - startTime).TotalSeconds;

    Console.WriteLine("Finished parallel run: {0}. Time taken: {1}", finishTime.ToLongTimeString(), timeTaken);

    return timeTaken;

    }

    }

    }

     

     

    and...

     

    Code Snippet

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    using System.Xml.Serialization;

    using System.ComponentModel;

    namespace TestParallelAPI

    {

    public class MatrixMultiplicationTask

    {

    private int _n = 500;

    private int _m = 500;

    private double[,] _matrixA;

    private double[,] _matrixB;

    private double[,] _product;

    [XmlElement("n")]

    public int N

    {

    get { return _n; }

    set { _n = value; }

    }

    [XmlElement("m")]

    public int M

    {

    get { return _m; }

    set { _m = value; }

    }

    [XmlIgnore]

    public double[,] MatrixA

    {

    get { return _matrixA; }

    set { _matrixA = value; }

    }

    [XmlIgnore]

    public double[,] MatrixB

    {

    get { return _matrixB; }

    set { _matrixB = value; }

    }

    [XmlIgnore]

    [ReadOnly(true)]

    public double[,] Product

    {

    get { return _product; }

    set { _product = value; }

    }

    public void Execute()

    {

    Random random = new Random(DateTime.Now.Millisecond);

    _matrixA = new double[_n, _m];

    _matrixB = new double[_n, _m];

    _product = new double[_n, _m];

    for (int i = 0; i < _n; i++)

    {

    for (int j = 0; j < _m; j++)

    {

    _matrixA[i, j] = random.NextDouble();

    _matrixB[i, j] = random.NextDouble();

    }

    }

    for (int i = 0; i < _n; i++)

    {

    for (int j = 0; j < _m; j++)

    {

    _product[i, j] = 0;

    for (int k = 0; k < _m; k++)

    _product[i, j] += _matrixA[i, k] * _matrixB[k, j];

    }

    }

    }

    }

    }

     

     

    Saturday, May 31, 2008 8:37 AM
  • IIRC because of the current work item partionining scheme employed by the Parallel Extensions CTP, all three tasks will actually execute on the same thread.  (You can check this by looking at CPU usage, and by printing the thread ID from inside the Execute method.)

     

    A workaround would be parallelizing the outer loop inside the matrix multiplication code, or splitting it otherwise into multiple chunks.

     

    Saturday, May 31, 2008 10:16 AM
  • Hi,

     

    Nice of you to paste the code :-)

     

    I did try to make it work but had some problems with the compilation.

     

    Can you produce a simplified code that only demonstrates the problem?

     

    Also try not to use Random to burn CPU time since it might work differently on different machines and could potentially use a shared resource.

     

    Apart from that I know that Sasha is correct about some limitations of the CTP but I will leave the talking for the dev team since I'm not sure what is under NDA and what is not.

     

    Regards,

    Asaf

    Saturday, May 31, 2008 11:41 AM

  • Hi Marek,

    I changed the for loop in Execute method to Parallel.For(0,_n,delegate(int i). I get half the time taken than the normal for loop.
    Basically correctly identifying the loop to parallelize is the essence of TPL.

    Apart of this, I initally thought that you were going to create some independent threads, created on a server like place. i.e. this library can be used to spawn different threads for each server daemon. Pass a list of daemons to Parallel.Do and different thread will be created for each. I have yet to check whether this approach is better or not than explicitily creating threads. If anyone has some thoughts on this, will be glad to hear it.
    This is simple to create but, I think this is not an efficient way. Because later if you want to create a new thread, you have to modify the list, i.e. it will be static list of daemons only. It was just a thought , when  I heard that it will inherently multi-thread the members of list passed to Parallel.Do.
    Monday, June 02, 2008 8:29 AM
  • Hi

    Thanks for the response.  I didn't really want to modify the "task" itself as I have a number of legacy tasks that I just want to run in parallel. I wrote my own little task scheduler using the thread pool and the whole thing scales really nicely - Eight processors results in a speedup of ~8 in the execution of the tasks.

    I guess I will park my interest in the TPL for now.

     

    Thanks again.

     

    Marek

    Monday, June 02, 2008 8:47 AM
  • Marek, a new CTP of Parallel Extensions is being released today.  While it doesn't have the final implementation for Parallel.For/ForEach, it does have slightly improved logic that should address your case.  Please try it out and let us know how it goes for you.

     

    Monday, June 02, 2008 1:19 PM