Unexpected Results while saving results to an Array using Parallel.For
- This is a modified version of some code I posted in another thread for CPU utilization. I discovered while I was debugging that problem that I had a more fundamental problem of my loop does not appear to save the correct data in the correct place in the Array for either calc_results or calc_geom. An array of [][][][] was suggested in the other thread instead of [,,,] but when I try to initialize it I get an error. I am not familiar with the [][][][] arrays and really don't know the differance between the two arrays. Thanks for your help.Results[, , ,] calc_results = new Results[heave_size, pitch_size, roll_size, steer_size];Start_Geom[,,,] calc_geom = new Start_Geom[heave_size, pitch_size, roll_size, steer_size];//Results[][][][] test_results = new Results[heave_size] [pitch_size] [roll_size] [steer_size];dnAnalytics.LinearAlgebra.Matrix x;double[] heaves = new double[heave_size];double[] pitches = new double[pitch_size];double[] rolls = new double[roll_size];double[] steering = new double[steer_size];Parallel.For(0, heave_size, delegate(int h)//for (int h = 0; h < heave_size; h++){for (int p = 0; p < pitch_size; p++){for (int r = 0; r < roll_size; r++){for (int s = 0; s < steer_size; s++){double heave_val = heave_min + h * heave_inc;double pitch_val = pitch_min + p * pitch_inc;double roll_val = roll_min + r * roll_inc;double steer_val = steer_min + s * steer_inc;x = new DenseMatrix(new double[,] { { heave_val }, { roll_val }, { pitch_val }, { steer_val }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 } });Start_Geom calc_geom_temp = new Start_Geom();Results calc_results_temp = new Results();kinematics.locate_vehicle(x, last_geom, start_geom,vehicle, test_track, component_params, static_results[i],results_cur, results_last, out calc_geom_temp, out calc_results_temp);calc_geom[h, p, r, s] = new Start_Geom(calc_geom_temp);calc_results[h, p, r, s] = new Results(calc_results_temp);//test_results[h][p][r][s] = new Results(calc_results_temp);steering[s] = steer_val;rolls[r] = roll_val;pitches[p] = pitch_val;heaves[h] = heave_val;}progressBar.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,new System.Action(delegate(){progressBar.Value = progressBar.Value+steer_size;}));}}});
Risposte
- Hey Perry,
Is there any way to send me a compilable repro? That would help a lot.
Thanks,
Josh- Contrassegnato come rispostaStephen Toub - MSFTMSFT, Moderatoremartedì 30 giugno 2009 22.42
Tutte le risposte
- Just to add a little more detail if I uncomment "for (int h = 0; h < heave_size; h++)" and comment "Parallel.For(0, heave_size, delegate(int h)" the procedure works exactly as expected.
- Hi,
Your probable cause of your problem is progressBar.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,new System.Action(delegate(){progressBar.Value = progressBar.Value+steer_size;}));
What happends if you comment that out? Does it works ok? I know you won't get you progress bar to update the way you want, but it helps isolate the problem.
The second cause might be locality. When you create such a big array, then you will get a contigous area of memory for your array, which will be locked on every use by the processor using it at that time so you won't get any speed up in the end.
Best regards,
Tibi
MCT, MCDBA, MCSD.NET, MCPD 2.0(*.*), MCPD 3.5(*.*) - Commenting out the progress bar has no effect, other than the obvious one of the progress bar not functioning. I have even attempted to change the calc_results[h, p, r, s] = new Results(calc_results_temp) to lock(calc_results[h, p, r, s] = new Results(calc_results_temp)); and that doens't seem to have any effect. The results in calc_results and calc_geom are still random.
- Hi Again,
If you want to use a different kind of array, here is a suggestion on how you can modify your code to avoid cache contention.
Results[][,,] calc_results = new Results[heave_size][, , ]; Start_Geom[][,,] calc_geom = new Start_Geom[heave_size][, , ]; // Results[][][][] test_results = new Results[heave_size] [pitch_size] [roll_size] [steer_size]; dnAnalytics.LinearAlgebra.Matrix x; double[] heaves = new double[heave_size]; double[] pitches = new double[pitch_size]; double[] rolls = new double[roll_size]; double[] steering = new double[steer_size]; //for (int h = 0; h < heave_size; h++) Parallel.For(0, heave_size, (h) => { calc_results[h] = new Results[pitch_size, roll_size, steer_size]; calc_geom[h] = new Start_Geom[pitch_size, roll_size, steer_size]; for (int p = 0; p < pitch_size; p++) { for (int r = 0; r < roll_size; r++) { for (int s = 0; s < steer_size; s++) { double heave_val = heave_min + h * heave_inc; double pitch_val = pitch_min + p * pitch_inc; double roll_val = roll_min + r * roll_inc; double steer_val = steer_min + s * steer_inc; x = new DenseMatrix(new double[,] { { heave_val }, { roll_val }, { pitch_val }, { steer_val }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 } }); Start_Geom calc_geom_temp = new Start_Geom(); Results calc_results_temp = new Results(); kinematics.locate_vehicle(x, last_geom, start_geom, vehicle, test_track, component_params, static_results[i], results_cur, results_last, out calc_geom_temp, out calc_results_temp); calc_geom[h][p, r, s] = new Start_Geom(calc_geom_temp); calc_results[h][p, r, s] = new Results(calc_results_temp); //test_results[h][p,r,s] = new Results(calc_results_temp); steering[s] = steer_val; rolls[r] = roll_val; pitches[p] = pitch_val; heaves[h] = heave_val; } progressBar.Dispatcher.Invoke( System.Windows.Threading.DispatcherPriority.Normal, new System.Action( delegate() { progressBar.Value = progressBar.Value + steer_size; } )); } } } ); Best regards,<br/>Tibi<br/>
MCT, MCDBA, MCSD.NET, MCPD 2.0(*.*), MCPD 3.5(*.*) - BTW,
I suppose that
new Start_Geom(calc_geom_temp); and new Results(calc_results_temp);
are copy constuctors, which you don't need in C#.
Instead you can just write calc_geom[h][p, r, s] = calc_geom_temp; calc_results[h][p, r, s] = calc_results_temp;
and remove the copy constructors!
Best regards,
Tibi
MCT, MCDBA, MCSD.NET, MCPD 2.0(*.*), MCPD 3.5(*.*) - If I remove the copy constructors it makes the entire array equal to the last thing I copied into the array.I tried the previous suggestion and though it did speed it up a tiny bit it is still giving me random values for the calc_results and calc_geom.
- I think that you have some interdependencies between results. You would have to ananlyze kinematics.locate_vehicle(x, last_geom, start_geom,
vehicle, test_track, component_params, static_results[i],
results_cur, results_last, out calc_geom_temp, out calc_results_temp);
Best regards,
Tibi
MCT, MCDBA, MCSD.NET, MCPD 2.0(*.*), MCPD 3.5(*.*) - Do you have any suggestions on what I should look for?
- Could static methods be the culprit?
- If they don't share data no!
/Tibi
MCT, MCDBA, MCSD.NET, MCPD 2.0(*.*), MCPD 3.5(*.*) - Does anyone have any suggestions on how I can track this error down? I.e. what tools I can use to figure out what is happening here? I have narrowed it down to something wrong in the parallel loop cause when I run it w/ only one outer loop everything calculates fine.
- Hi pltaylor,My guess is that the problem lies here:
steering[s] = steer_val; rolls[r] = roll_val; pitches[p] = pitch_val; heaves[h] = heave_val;
You've parallelized the outer-most for loop with the index 'h'. Multiple iterations each with their own given index 'h' may be
executing in an overlapped manner and because you've defined these four arrays outside of the loop, the same state is shared between
threads. With just the right (wrong?) overlapping, two threads could be processing the same index 'r' and race to set roll[r], for example.
I can't see in your code, however, if you're actually using the values stored in these arrays, or just storing them. What do the constructors
for Start_Geom and Results do?
Josh
- The
steering[s] = steer_val; rolls[r] = roll_val; pitches[p] = pitch_val; heaves[h] = heave_val;
are in the inner most part of the loop and are actually filling correctly. (They fill multiple times
during the looping but they have so far always ended up correct at the end. It is the
x = new DenseMatrix(new double[,] { { heave_val }, { roll_val }, { pitch_val }, { steer_val }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 } });
...
kinematics.locate_vehicle(x, last_geom, start_geom, vehicle, test_track, component_params, static_results[i], results_cur, results_last, out calc_geom_temp, out calc_results_temp); calc_geom[h][p, r, s] = new Start_Geom(calc_geom_temp); calc_results[h][p, r, s] = new Results(calc_results_temp);that is not executing as expected. It appeared as though my x was going sideways w/ multiple x's
existing at the same time. So I tried
dnAnalytics.LinearAlgebra.Matrix x = new DenseMatrix(new double[,] { { heave_val }, { roll_val }, { pitch_val }, { steer_val }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 }, { 0 } });to localize it to the thread. This gives me different and still not valid results. The only thingI have been able to do to give valid results is limit the outermost loop to a single iteration whichjust makes it all happen in one thread.The constructors are just copy constructors. The fact that those four arrays are filling correctly is probably just luck :) The code is non-deterministic, and if you're actually consuming those values later, you're bound to get just the right interleaving at some point in time that may cause unexpected behavior.
From what you've described I'm willing to bet the farm that you have some state that is shared -- now it's just a matter of finding said state and handling it well.
For example, in...
kinematics.locate_vehicle(x, last_geom, start_geom,
vehicle, test_track, component_params, static_results[i],
results_cur, results_last, out calc_geom_temp, out calc_results_temp);
... some of these parameters must be defined outside the loop as I don't see their definitions inside the loop. What is last_geom, start_geom? What is static_results and is it possible that the same value for 'i' can be given to multiple threads?
Are there any static variables referenced in locate_vehicle?
You'll need to track down what state these loop bodies share. Also, what happens when you change your Parallel.For loop line to the following?
var parOpts = new ParallelOptions();
parOpts.MaxDegreeOfParallelism = 1;
Parallel.For(0, heave_size, parOpts, delegate(int h)
Is the behavior correct?
Josh- In reverse order,var parOpts = new ParallelOptions();
parOpts.MaxDegreeOfParallelism = 1;
Parallel.For(0, heave_size, parOpts, delegate(int h)behaves correctly, albeit sloooooowwly.locate vehicle is as follows.public static void locate_vehicle(Matrix x, Start_Geom last_geom, Start_Geom start_geom,Veh_Def veh_def, Track_Def track_def, Component_Params component_params,Results static_results,Results results_cur, Results results_last, out Start_Geom new_geom, out Results results){new_geom = new Start_Geom();results = new Results();int m = x.Columns;for (int i = 0; i < m; i++){double front_twist, rear_twist;//results.index = i;new_geom = locate_chassis(x.GetColumn(i), start_geom);new_geom = locate_steering_system(x.GetColumn(i), new_geom, start_geom, veh_def);new_geom = locate_front_suspension(x.GetColumn(i), new_geom, start_geom, start_geom, component_params);locate_front_arb(new_geom, veh_def, component_params, out new_geom, out front_twist);new_geom = locate_rear_suspension(x.GetColumn(i), new_geom, start_geom, veh_def, component_params);locate_rear_arb(new_geom, veh_def, component_params, out new_geom, out rear_twist);calc_results(x.GetColumn(i), new_geom, start_geom, results_cur, results_last, static_results, veh_def, track_def,component_params, "setup", out new_geom, out results, out component_params);}}All the methods this calls are public static....May not be the cleanest way, but it is how I have done it for now.All of the variables in locate_vehicle stay constant for the entire loop except for x. In this particular loop a lot of the variables are actually the same thing just named differently as this is a special case of a much more generic method.static_results[i] is a List<Results> static_results = new List<Results>(); defined in my Main Class so it carries over to all methods. - If P.For with the MaxDOP set to 1 works, it's not a bug with TPL, it must be some shared state somewhere. The slow performance is expected, though I'd like to understand just how slow sloooooooowly is. :)To be honest, your code has a lot of layers and depth that makes this hard to debug here. Again, I encourage you to follow all your method calls and the parameters you pass in to determine where multiple threads might be touching the same variable(s).Hope that helps,Josh
- Proposto come rispostaJosh PhillipsMSFTmartedì 9 giugno 2009 19.17
- I guess my only other question then is do you have any suggestions for trying to find those varaibles that are being shared other than looking through it?Slow is going from approx 20 secs with a reg for loop to approx 29 secs w/ single threaded p.for()
- Hmm. A 45% increase seems a little much but not unfathomable. Is this on the Visual 2010 Beta or the old CTP?
As for suggestions...
There are certainly tools that exist to detect data-races (doing a quick bing-search should give you some ideas) and this is an extremely hot area of research but, unfortunately, there is no definitive tool I can recommend at this point in time.
Some general suggestions:
1. Watch out for closures. What variables are being capture in your anonymous methods or lambdas?
2. List the variables that are defined outside of your parallel for loop (or threads) and track how they are used in the loop iterations. Is there a chance that any of them are touched concurrently?
3. Look out for things like loops nested inside of a parallel for loop. Is there any chance that the nested loops on any two threads might be processing the same index at any given point in time? - I'll run it a couple more times tomorrow to double check the performance hit. It is VS 2010 Team edition.I assume that since I define int p, int r, and int s that they are thread safe and should be fine?Something tells me I'll just have to take some serious time eliminating possibilities one at a time until all is good.
- Yep. p, r, and s, should be fine, except in the case I described before where the same index may be processed concurrently with another thread, i.e. Thread A is processing h = 10, p = 0, r = 0, s = 1 and Thread B is processing h = 11, p = 1, r = 3, s = 1. In this case A and B will be racing to set steering[1].
Unfortunately, the story around data races is sad. That's definitely good feedback for us to have though.
Thanks!
Josh - That would actually explain some of the behavior I was witnessing this afternoon. How can I set up these loops so that it doesn't race? Or is multiple nested for loops pretty much a no-no.This may seem a little goofey but could I put multiple p.for loops to delegate the variables individually? Would that prevent the race? I may be going down the wrong path.
- Unfortunately, nesting p.for loops won't help here because those nested loops can still process the same index simultaneously.Why is it that you store these values and overwrite them multiple times? Do you use them anywhere? Are you just storing the last calculated value? Where are the XX_min and XX_inc variables define? Are they invariants or do they actually change? It looks to me that unless those are static variables and updated inside one of the methods you call in your loop, they are invariants. If they are in fact invariants then the XX_val variables only need to be calculated in 4 non-nested loops rather than a nested loop. That would reduce a lot of unnecessary computation from your nested for loop. Essentially, you'd reduce the complexity of filling the steering, rolls, pitches, and heaves arrays from O(n^4) to O(n) -- a big savings.That said, if those values really are invariants than the data-race there is benign (except for performance) as each cell is just being updated with the same value over and over.If they are not invariants and they truly do change, then you must just be overwriting each cell with the most recent value calculated. If those values were floats, the races would still be benign as you'd have avoided torn reads. Because they are doubles, if they are not invariants, you will need to create an object to lock on for each cell. :(I don't see those arrays actually be read from in your loop so my guess is that they can be calculated outside of your loops and therefore, present no concurrency issue.Josh
In reverse order,
BTW, I just re-read this. I don't know where 'i' is defined but I would look closely at static_results. Since it's a static variable shared by all threads, it worries me. Also, if all the methods in here are static or take in out params (and it seems they do both) that's bothersome.var parOpts = new ParallelOptions();
parOpts.MaxDegreeOfParallelism = 1;
Parallel.For(0, heave_size, parOpts, delegate(int h)behaves correctly, albeit sloooooowwly.locate vehicle is as follows.public static void locate_vehicle(Matrix x, Start_Geom last_geom, Start_Geom start_geom,Veh_Def veh_def, Track_Def track_def, Component_Params component_params,Results static_results,Results results_cur, Results results_last, out Start_Geom new_geom, out Results results){new_geom = new Start_Geom();results = new Results();int m = x.Columns;for (int i = 0; i < m; i++){double front_twist, rear_twist;//results.index = i;new_geom = locate_chassis(x.GetColumn(i), start_geom);new_geom = locate_steering_system(x.GetColumn(i), new_geom, start_geom, veh_def);new_geom = locate_front_suspension(x.GetColumn(i), new_geom, start_geom, start_geom, component_params);locate_front_arb(new_geom, veh_def, component_params, out new_geom, out front_twist);new_geom = locate_rear_suspension(x.GetColumn(i), new_geom, start_geom, veh_def, component_params);locate_rear_arb(new_geom, veh_def, component_params, out new_geom, out rear_twist);calc_results(x.GetColumn(i), new_geom, start_geom, results_cur, results_last, static_results, veh_def, track_def,component_params, "setup", out new_geom, out results, out component_params);}}All the methods this calls are public static....May not be the cleanest way, but it is how I have done it for now.All of the variables in locate_vehicle stay constant for the entire loop except for x. In this particular loop a lot of the variables are actually the same thing just named differently as this is a special case of a much more generic method.static_results[i] is a List<Results> static_results = new List<Results>(); defined in my Main Class so it carries over to all methods.- Contrassegnato come rispostaStephen Toub - MSFTMSFT, Moderatoremartedì 16 giugno 2009 0.16
- Contrassegno come risposta annullatoStephen Toub - MSFTMSFT, Moderatoremartedì 16 giugno 2009 20.25
- Can you elaborate on"Also, if all the methods in here are static or take in out params (and it seems they do both) that's bothersome. "What about static methods or out params are worrisome? It would take some interesting workarounds not to use the out params, but the static methods would be an easy change, that I have experimented unsuccssefully with but not going through and getting rid of all the staitc methods. In my latest round of debugging I think I have narrowed it down to thecalc_results(x.GetColumn(i), new_geom, start_geom, results_cur, results_last, static_results, veh_def, track_def,component_params, "setup", out new_geom, out results, out component_params);method. Is there anything particularly troublesome as to how this function is called? The actual method is a lot of code that contains more methods itself, so I'm not going to bother posting it.
Sorry for not being clear. I meant methods that mutate static variables.
For example,
class MyClass
{
private static A m_state;public void Foo( out B myParam )
{
...
m_state = ...;
...
myParam = ...;
}
}void Main()
{
...
B data = ...;
...
ParallelFor( 0, N, (i) =>
{
...
var x = new MyClass();
...
x.Foo( out data );
});
}is bad because it's modifying shared state through both a static variable and a variable that is defined outside of the P.For.
So, looking at calc_results, are there any static variables being written to inside? Are new_geom, results, or component_params defined outside of the P.For?
Josh
- I have been able to cut down on alot of the racing but it appears that I am still getting some about 50% of the time. It definetly appears that the out variables are my biggest problems and playing with them has moved me forward. For instance changingpublic static void calc_results(dnAnalytics.LinearAlgebra.Vector x, Start_Geom new_geom_in ,Start_Geom start_geom,Results results , Results results_last, Results static_results, Veh_Def veh_def, Track_Def track_def, Component_Params_in component_params,string mode, out Start_Geom new_geom, out Results results, out Component_Params component_params){new_geom = new Start_Geom(new_geom_in);results=new Results(results_cur);component_params=new Component_Params(component_params_in);...... lots of calculations here}topublic static void calc_results(dnAnalytics.LinearAlgebra.Vector x, Start_Geom new_geom ,Start_Geom start_geom,Results results , Results results_last, Results static_results, Veh_Def veh_def, Track_Def track_def, Component_Params component_params,string mode, out Start_Geom new_geom_out, out Results results_out, out Component_Params component_params_out){....lots of calculations.....new_geom_out = new Start_Geom(new_geom);results_out = new Results(results);component_params_out = new Component_Params(component_params);}helped significantly. Obviously I haven't actually solved the probelm, just lessoned the time that there can be a race. How can I eliminate this race? Is there a different way to create methods that have multiple variables coming out of the left side that I am just not familiar with?Once again I appreciate the help-perry
Hi Perry,
I'm glad you're making progress! It's perfectly acceptable to use out params, you just have to be really careful with how you use them. You really don't want to pass in paramaters that are defined outside of the the thread boundary (in this case outside of the P.For) unless you're 100% positive that those variables are only being read from and never being written to concurrently. In fact, make use of immutable types as much as possible -- there's lots o' good reading here: http://weblogs.asp.net/bleroy/archive/2008/01/16/immutability-in-c.aspx.
Josh- Josh-I've continued to make progress, but I am confused as to my latest progress. What is the difference between these two pieces of code inside a p.for loop?...Results calc_results_temp = new Results();calc_results_temp = kinematics.locate_vehicle(x, start_geom, vehicle, component_params);calc_results[h, p, r, s] = new Results(calc_results_temp);....and...calc_results[h, p, r, s]= kinematics.locate_vehicle(x, start_geom, vehicle, component_params);...obviously the second is a lot better from a performance, cleanliness, and overal coding reason but I would have thought the result would have beeen identical. It turns out that if I run the first bit of code I get some random results. If I use the second bit of code all is good and I get predidictable results.
- Hey Perry,
Is there any way to send me a compilable repro? That would help a lot.
Thanks,
Josh- Contrassegnato come rispostaStephen Toub - MSFTMSFT, Moderatoremartedì 30 giugno 2009 22.42

