locked
Parallel foreach hanging on completions RRS feed

  • Question

  • Good evening

    I have a .net core application that I am using DirectorySearcher to loop through to get all the information for AD.  The original app did this on a single thread and took hours.  I broke up the search into wildcards by first letter (a-z plus some special characters) and implemented a Parallel.ForEach on the array of characters using -1 as the MaxDegreeOfParallelism.  This has sped up the fetch and subsequent writing of results to SQL Server tremendously.  Unfortunately I am seeing on the last iteration the code hangs about 50% of the time.  My console.writeline shows the last part of the loop completes but doesn't move on.  I threw a counter and a test for counter == array length then attempted to apply a Break() but it didn't improve.  Any suggestions would be appreciated.

    int[] charList = new int[]{33,34,35,36...}

    int counter =0;

    var options - new ParallelOptions()

    {

    MaxDegreeOf Parallelism = -1;

    }

    Parallel.ForEach(charList,options(item,loopState=>

    {

    //create filter and call directory searcher and insert results to staging table  1st user then group

    counter++;

    if(counter == charList.Length){

    loopState.Break();

    }

    });

    Thanks.

    Monday, May 18, 2020 10:54 PM

Answers

  • Hi Cheesebread,

    Thank you for posting here.

    Parallel.Foreach will wait for all its branch tasks to complete and then return synchronously.

    So the completion of the task conditional on the last value in charList does not necessarily mean the end of the entire Parallel.Foreach, it may need to wait for other tasks to complete.

    Your program may "hang" for this reason.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Tuesday, May 19, 2020 6:28 AM

All replies

  • You have a race condition inside your loop. If two separate threads execute simultaneously the counter++ and/or the subsequent comparison on counter, the result may be incorrect. The counter may never reach charlist.Length.

    Make sure that you use a lock around these instructions, or use the Interlocked methods if performance is critical (which in this case it isn't given that the performance impact of the lock will be insignificant compare with the AD and DB access times).

    lock (lockObject)
    {
      counter++;
      if(counter == charList.Length)
      {
          loopState.Break();
      }
    }

    lockObject needs to be declared outside the loop: object lockObject = new object();

    Tuesday, May 19, 2020 6:06 AM
  • Hi Cheesebread,

    Thank you for posting here.

    Parallel.Foreach will wait for all its branch tasks to complete and then return synchronously.

    So the completion of the task conditional on the last value in charList does not necessarily mean the end of the entire Parallel.Foreach, it may need to wait for other tasks to complete.

    Your program may "hang" for this reason.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Tuesday, May 19, 2020 6:28 AM
  • Maybe your core code is not suitable for parallel processing too. Show some details.


    • Edited by Viorel_MVP Tuesday, May 19, 2020 8:39 AM
    Tuesday, May 19, 2020 8:37 AM
  • Thank you for the feedback.  I am doing a dump of AD  (directorysearcher with properties, loop through searchresultcollection, add each to db table using ef core  so I can see that my totals for Users, Groups and such have all completed.  Since it is multiple class levels and probably 250+ lines of code it doesn't make for a good post.  It is however heavy I/O and the order doesn't matter in the least.  It takes close to 2 hours and at that point it seems that it will just stop processing.  The Process memory is steady and the CPU is close to 0.  If I reduce the size of the parallel loop it seems not to hang as often but since I am given the requirements of an AD harvest using .NET Core.  Basically I am wondering if there is any known issue where a Parallel.for loop hangs or if there is any brute force way I can break out of it.  Thanks again to you and all who have taken the time.

    Wednesday, May 20, 2020 1:46 AM
  • Hi thank you for the thoughts.  It is an interesting idea and I will try it although I am seeing that the object that I am instantiating and calling inside the parallel for each is where the last instance seems to die.  It isn't that I am waiting for another thread to complete because I am pulling users and groups from AD and I can see that the total number of users 182K have been pulled and written.  That said I will try the lock and perhaps if the race condition essentially throws off my forced break I will know. Thank you.
    Wednesday, May 20, 2020 1:51 AM
  • Hello,

     

     I am not sure if this will help but perhaps an new approach may hint good result;

    If the AD is like the that of OS directory/file structures, then try using class Queue.

      In this manner, the Parallel code only needs to check

      if objectContainer.Count > 0 objectContainer.Dequeue else break

    kind of  idea.  This is the somewhat how I code my OS directory/file searches but

    with the added benefit of Recursion all done in a single BackGroundWorker thread.

     

       The "Lock" is very important when multi-threads try access the same object container.

     

     Hope this helps :)

    Wednesday, May 20, 2020 3:21 AM