none
having trouble using PLINQ RRS feed

  • Question

  • Hi everybody,

     

    I need to execute one of my queries using PLINQ but I'm facing a problem:

    my scenario is as below:

    List<object3> list2 = new List<object3>();
    // list1 is of type HashSet<PairObject<object1, object2>>
    // Key is of type object1
    // Value is of type object2
    list1.AsParallel().GroupBy(l1 => l1.Key).ForAll(
       delegate(IGrouping<object1, PairObject<object1, object2>> dtGroup)
       {
         list2.AddRange(
           dtGroup.AsParallel().Select(d => d.Value).Select(
             delegate(object2 o)
             {
                /* very complex code goes here */
                return new object3();
             }
           ).ToArray() /* just for FullyBuffered to take effect */
         );
       }
    );

    when executing the code above, I get arbitrary exceptions, Sometimes ArgumentOutOfRangeException and sometimes IndexOutOfRangeException but on AddRange of list2

    and sometimes it executes successfully without any exception but list2 contains unwanted null values!!

    I tried removing the second .AsParallel but it didn't help, Can you experts please help me out? how can I get rid of those null values? Is the solution just ignoring them?


    learn to learn
    Thursday, June 10, 2010 6:23 AM

Answers

  • Perhaps I'm dim, I searched the documentation for the ToArray()<---> WithMergeOptions(ParallelMergeOptions.FullyBuffered) link but didn't find it (it does says that for the foreach), it only said that ToList() and ToArray() force execution of the query.  Now, as I said I'm not expert on PLINQ but the documentation said you should not assume much.

    My hypotesis is simple, if the problem is intermitent then paralelism should be the problem.  The exception you are getting seems to be because variable lenght of the list.  Perhaps the resizing is the problem.

    I suggest you try simple stuff, if you now the maximum size of the list, build it that way, use a a safe collection as provided by the documentation, try to use only one parallel query, specifie mergeoptions in both cases, etc

    Let me know what happens

    • Marked as answer by FarzanCool Tuesday, June 15, 2010 6:58 AM
    Monday, June 14, 2010 1:54 PM

All replies

  • I'm not an expert on Paralel prog. but the exception and the null values are due to the ToArray() and the paralel combo.

    I don't know what the /* very complex code goes here */ 

    does but the size of the array is not fixed, is defined on the result of whatever you do here

    Sometimes the ToArray() call will start before the code inside finishes so there will be inconsistencies ergo your problems

    Hope this help

    PS: People don't tend to reply to this forum much, wonder why?

    Thursday, June 10, 2010 12:19 PM
  • Hi, thanks for you reply

    But referring to the information about MergeOptions provided by Microsoft (http://msdn.microsoft.com/en-us/library/dd997424.aspx ) if I put .ToArray() method it will be equivalent to

    .WithMergeOptions(ParallelMergeOptions.FullyBuffered)
    which means that the result of query will be fully buffered and they will be returned after all of them are produced, so this means that .AddRange will always receive the result of query at once.

    learn to learn
    Monday, June 14, 2010 11:17 AM
  • Perhaps I'm dim, I searched the documentation for the ToArray()<---> WithMergeOptions(ParallelMergeOptions.FullyBuffered) link but didn't find it (it does says that for the foreach), it only said that ToList() and ToArray() force execution of the query.  Now, as I said I'm not expert on PLINQ but the documentation said you should not assume much.

    My hypotesis is simple, if the problem is intermitent then paralelism should be the problem.  The exception you are getting seems to be because variable lenght of the list.  Perhaps the resizing is the problem.

    I suggest you try simple stuff, if you now the maximum size of the list, build it that way, use a a safe collection as provided by the documentation, try to use only one parallel query, specifie mergeoptions in both cases, etc

    Let me know what happens

    • Marked as answer by FarzanCool Tuesday, June 15, 2010 6:58 AM
    Monday, June 14, 2010 1:54 PM
  • I have another line of code which is similar to second query inside ForAll(...), this code executes without any error or null object: my code is something like:

    listAddRange(list2.AsParallel().Join(list3.AsParallel(), t1 => t1.f1, t2 => t2.f1, (t1, t2) => t2).ToArray());

    this one shows that the problem isn't related to .ToArray(). and you're right when you say there's nothing mentioned in document about relation between ToArray and FullyBuffered. I provided the wrong document, what I said can be found in the following document: http://msdn.microsoft.com/en-us/library/dd997399.aspx it says: "If you are storing the results of a query by calling ToArray or ToList, then the results from all parallel threads must be merged into the single data structure." Doesn't it mean the same behavior will take place as if I had executed query with FullyBuffered option? or am I misunderstanding?

     

    But your solution to just use one parallel query worked, I replaced ForAll with a ForEach which will iterate on a single thread.


    learn to learn
    Tuesday, June 15, 2010 6:58 AM