locked
Multithreading on recursive function to read a drive folders metadata. RRS feed

  • Question

  • Hi,

    We have a shared drive whose folders permissions and name we are updating in database. In order to read all folder we have used recursive function to traverse all folders(tree structure). While traversing the information of the folder is inserted into a list object and when the item is around 10k we push it to database.it worked fine in simple manner

    In order to speed up the process we have used multithreading on the recursive function so that we can read large drive(considering root folder for each thread and restricting thread to 5 ) but it is not updating the list object properly. It seems multithread is not updating same list object properly.

    Code:

    for(int i=0;i<NumberOfThreads,i++)

    {

    foreach(string dr in path)

    {

    List<Task> taskArray = new List<Task>();

    taskArray.Add(Task.Factory.StartNew((object obj)=>

    {

    var thread = new Thread(() => RecursiveMethod(param1,param2));

    thread.Start();

    },dr));

    Task.WaitAll(taskArray.ToArray());

    }

    In above path is the all root folder path of that drive.


    • Edited by Md Zakir Monday, August 17, 2020 6:01 AM content added
    Monday, August 17, 2020 5:56 AM

Answers

  • It seems that RecursiveMethod and other parts uses a List to keep the folder data, but List is not a thread-safe collection. According to documentation for List, consider some alternative kind of collections, maybe a ConcurentBag. But you probably need to protect larger area of the code. In this case consider lock; something like this:

    List<FolderData> list

    . . .

    lock( list )

    {

       list.Add( folder data)

       if( list is large enough)

       {

          Insert to database here, then ‘list.Clear’

          or start a database-related thread, pass the list, then ‘list = new List’

       }

    }

    • Marked as answer by Md Zakir Monday, August 17, 2020 11:00 AM
    Monday, August 17, 2020 8:42 AM
  • Just a comment:

    Parallel access to a single device is often not possible and will be serialized. So a threaded approach may not give the expected overall runtime gain.

    Depending on your actual code using loops (directories are trees, so traverse the leafs) instead of recursion may already give you some performance boost without the pitfalls of threaded programming.

    • Marked as answer by Md Zakir Monday, August 17, 2020 11:00 AM
    Monday, August 17, 2020 8:51 AM

All replies

  • It seems that RecursiveMethod and other parts uses a List to keep the folder data, but List is not a thread-safe collection. According to documentation for List, consider some alternative kind of collections, maybe a ConcurentBag. But you probably need to protect larger area of the code. In this case consider lock; something like this:

    List<FolderData> list

    . . .

    lock( list )

    {

       list.Add( folder data)

       if( list is large enough)

       {

          Insert to database here, then ‘list.Clear’

          or start a database-related thread, pass the list, then ‘list = new List’

       }

    }

    • Marked as answer by Md Zakir Monday, August 17, 2020 11:00 AM
    Monday, August 17, 2020 8:42 AM
  • Just a comment:

    Parallel access to a single device is often not possible and will be serialized. So a threaded approach may not give the expected overall runtime gain.

    Depending on your actual code using loops (directories are trees, so traverse the leafs) instead of recursion may already give you some performance boost without the pitfalls of threaded programming.

    • Marked as answer by Md Zakir Monday, August 17, 2020 11:00 AM
    Monday, August 17, 2020 8:51 AM
  • Hi 

    I have used Parallel.ForEach method instead of foreach but the values are not recorded properly for the the directories containing many data say almost 17k.

    Tuesday, August 18, 2020 8:26 AM
  • Hi 

    I have used Parallel.ForEach method instead of foreach but the values are not recorded properly for the the directories containing many data say almost 17k.

    Tuesday, August 18, 2020 8:27 AM