# Parallel.For with double loop

### Question

• I am new to parallel computing and made a simple application to compare.  Why does the following code have no speed advantage compared to the non-parallel version?

```        public void CalculatePatternParallel()
{
int Nx = 1024;
int Ny = 1024;
double u0 = 0;
double v0 = 0;

double[,] pat_dB = new double[Ny, Nx];

double du = 2.0 / (Nx - 1);
double dv = 2.0 / (Ny - 1);

double v = -1.0;
double u = -1.0;
{
v = -1.0 * i * dv;
for (int j = 0; j < Nx; j++)
{
u = -1.0 * j * du;
var Ax = Math.Sin(Math.PI * Nx * (u - u0)) /
Math.Sin(Math.PI * (u - u0)) / Nx;
var Ay = Math.Sin(Math.PI * Ny * (v - v0)) /
Math.Sin(Math.PI * (v - v0)) / Ny;
pat_dB[j, i] = 20 * Math.Log10(Math.Abs(Ax * Ay));
}
});

}```

Thursday, August 02, 2012 3:44 AM

• Scruffy John,

I'm confused by the math being used to calculate the improvement.

Let's say, for the sake of example, that et2 is 100 milliseconds and et1 is 400 milliseconds.  Then et2/et1 would be .25, 1-.25 would be .75, and .75 * 100 would be 75, which would result in 75% being output.  In reality, though, 100 milliseconds is 4x the performance of 400 milliseconds.

Working in reverse from your numbers, which appear to average out to about 61%, that would mean that et2/et1 was .39, or a ~2.6x speedup on four cores.

Friday, August 24, 2012 6:03 AM
• When I run the code as defined I do not get the speed improvement that you are showing.  Are you using .NET 4.0 and building with VS2010?

I did look into what Mansoor Omrani had replied with.  I did more research on false sharing and when I allocate the array differently I do get a 3.2 - 4.4 speed improvement running on a system with two i7 quad cores (8 cores total).

Saturday, August 25, 2012 2:50 AM
• my apologies on the math. I reran everything. The following are the results in millisecs only. You see the performance increase AFTER the parallel logic has been called the 1st time.

```non-Par=239
Parallel = 198
Parallel = 150
Parallel = 54

non-Par=230
Parallel = 224
Parallel = 55
Parallel = 58

non-Par=264
Parallel = 289
Parallel = 92
Parallel = 60

non-Par=234
Parallel = 178
Parallel = 50
Parallel = 54```

jfras2009: I am using VS2010 Professional

if you are curious, I wrote about the initialization behavior for threads and parallel code

Saturday, August 25, 2012 4:53 PM

### All replies

• What kind of system are you running on?  Does it only have one logical core?

I just tried your code and compared it to a sequential version where I just replaced your Parallel.For call with a regular for loop.  On my quad-core laptop, I consistently see the parallel version run 3x faster than the sequential version.

Thursday, August 02, 2012 6:56 PM
• I am running on a desktop system with two i7 quad core processors (8 logical processors) and Windows 7 Ultimate 64-bit OS with .NET 4.0.  I am using a Stopwatch to time the two using the code shown below where CalculatePattern is the same method with a convetional For loop.  After running it multiple times, there are some time that I can up to 2x improvement.  Still not what I was expecting.

```        public Form1()
{
InitializeComponent();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
CalculatePattern();
stopwatch.Stop();
var elapsedTime1 = stopwatch.ElapsedMilliseconds;

stopwatch.Reset();
stopwatch.Start();
CalculatePatternParallel();
stopwatch.Stop();
var elapsedTime2 = stopwatch.ElapsedMilliseconds;

textBox1.Text = elapsedTime1.ToString();
textBox2.Text = elapsedTime2.ToString();
}```

• Edited by Friday, August 03, 2012 3:02 AM
Friday, August 03, 2012 2:14 AM
• Have you tried just doing this in a console app, in a loop, e.g.

var sw = new Stopwatch();
while(true)
{
sw.Restart();
CalculatePattern();
sw.Stop();
var et1 = sw.ElapsedMilliseconds;

sw.Restart();
CalculatePatternParallel();
sw.Stop();
var et2 = sw.ElapsedMilliseconds;

Console.WriteLine(et1 / (double)et2);
}

From your code, it looks like you're doing it as part of opening a form, which means a) there's likely other processing happening at the same time in order to start your app, open your windows, etc., and b) you're paying startup costs associated with running the code once, e.g. the time to JIT all of the code involved in running in parallel, spinning up the threads for the first time in the thread pool, etc.

Friday, August 03, 2012 4:19 PM
• I made a console application and ran the test.  On average I am only getting a factor of 2x improvement running on 8 cores.  Is there a settting I am missing which is preventing it from getting more of a speed advantage?  I do not expect 8x improvement but I thought that it would be greater than 4x since you said you were getting a factor of 3x using a single quad core.

Saturday, August 04, 2012 2:30 AM
• Is it because of false sharing?
Saturday, August 04, 2012 9:33 AM
• as another person trying it on a quad machine too, I ran the following fives times:

```       static void Main(string[] args)
{
double et2 = 0;
var sw = new Stopwatch();

sw.Restart();
CalculatePattern();
sw.Stop();
var et1 = sw.ElapsedMilliseconds;

sw.Restart();
CalculatePatternParallel();
sw.Stop();
et2 = sw.ElapsedMilliseconds;

Debug.WriteLine((1-et2/(double)et1)*100 + "%");

sw.Restart();
CalculatePatternParallel();
sw.Stop();
et2 = sw.ElapsedMilliseconds;

Debug.WriteLine((1 - et2 / (double)et1) * 100 + "%");

sw.Restart();
CalculatePatternParallel();
sw.Stop();
et2 = sw.ElapsedMilliseconds;

Debug.WriteLine((1 - et2 / (double)et1) * 100 + "%");
}```

the improvement of the parallel code over the sequential code I get is:

```52.9411764705882%
57.1428571428571%
59.2436974789916%

66.3978494623656%
58.6021505376344%
66.3978494623656%

46.7532467532468%
57.1428571428571%
55.4112554112554%

72.1264367816092%
71.551724137931%
69.8275862068966%

57.8544061302682%
61.3026819923372%
60.1532567049808%

```

Saturday, August 04, 2012 5:14 PM
• Scruffy John,

I'm confused by the math being used to calculate the improvement.

Let's say, for the sake of example, that et2 is 100 milliseconds and et1 is 400 milliseconds.  Then et2/et1 would be .25, 1-.25 would be .75, and .75 * 100 would be 75, which would result in 75% being output.  In reality, though, 100 milliseconds is 4x the performance of 400 milliseconds.

Working in reverse from your numbers, which appear to average out to about 61%, that would mean that et2/et1 was .39, or a ~2.6x speedup on four cores.

Friday, August 24, 2012 6:03 AM
• When I run the code as defined I do not get the speed improvement that you are showing.  Are you using .NET 4.0 and building with VS2010?

I did look into what Mansoor Omrani had replied with.  I did more research on false sharing and when I allocate the array differently I do get a 3.2 - 4.4 speed improvement running on a system with two i7 quad cores (8 cores total).

Saturday, August 25, 2012 2:50 AM
• my apologies on the math. I reran everything. The following are the results in millisecs only. You see the performance increase AFTER the parallel logic has been called the 1st time.

```non-Par=239
Parallel = 198
Parallel = 150
Parallel = 54

non-Par=230
Parallel = 224
Parallel = 55
Parallel = 58

non-Par=264
Parallel = 289
Parallel = 92
Parallel = 60

non-Par=234
Parallel = 178
Parallel = 50
Parallel = 54```

jfras2009: I am using VS2010 Professional

if you are curious, I wrote about the initialization behavior for threads and parallel code

Saturday, August 25, 2012 4:53 PM