locked
Reading multiple CSV file with TASK library RRS feed

  • Question

  • i want to read multiple csv file at a time and that is why i like to use task. so guide me how can i use task to read multiple csv file and when all csv file read will be completed then another function will fire that let me know job done.

    i have to store csv data in data table that is area which is bugging me. suppose when i use task to read multiple csv file and if many thread try to write csv data into data table then dead lock may arise.......guide me how to handle it best way.

    if i use task to read multiple csv file at a time then there will be any benefit for processing time...means it will be faster to complete the job if i read multiple csv file with the help of task or should i read each file one by one in loop synchronously ?

    which one would be best synchronously or asynchronously to read multiple csv file and store data into datatable ?

    looking for guide and sample code to achieve my goal. thanks

    Sunday, April 24, 2016 8:09 PM

Answers

  • If the files are all on one drive then reading multiple files at once will slow things down.

    This is because disk IO is quite an expensive task and it will often be that which is your bottleneck.

    I would read the files and process them one by one.

    Take a look at the code here:

    http://stackoverflow.com/questions/23444860/process-list-of-files-asynchronously-using-async-and-await-in-c-sharp-console-ap

    public class MyClass
    {
        private int filesRead = 0;
    
        public void Go()
        {
            GoAsync().Wait();
        }
    
        private async Task GoAsync()
        {
            string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");
    
            Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
    
            var tasks = fileSystemEntries.OrderBy(s => s).Select(
                fileName => DoStuffAsync(fileName));
            await Task.WhenAll(tasks.ToArray());
    
            Console.WriteLine("Finish! Read {0} file(s).", filesRead);
        }
    
        private async Task DoStuffAsync(string filePath)
        {
            string fileName = Path.GetFileName(filePath);
            using (var reader = new StreamReader(filePath))
            {
                string firstLineOfFile = 
                    await reader.ReadLineAsync().ConfigureAwait(false);
                Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
                Interlocked.Increment(ref filesRead);
            }
        }
    }


    Hope that helps.

    Technet articles: WPF: Layout Lab; All my Technet Articles

    • Marked as answer by Mou_kolkata Tuesday, April 26, 2016 6:05 PM
    Tuesday, April 26, 2016 9:56 AM
  • Hi Mou_kolkata,

    >>suppose when i use task to read multiple csv file and if many thread try to write csv data into data table then dead lock may arise.......guide me how to handle it best way.

    Yes, based on your scenario, your case more likely arise a dead lock. Even though, I still suggest you use task to read multiple CSV files. It is more efficient. But we must deal with a dead lock issue.

     lock (_lock) 
      {
         DoWork();
      }
     

    For example, please look at the above code can be used to ensure that a job is done only once.

    MSDN actually has several good articles on this topic, one of which being:

    https://blogs.msdn.microsoft.com/mohamedg/2010/01/29/how-to-use-locks-and-prevent-deadlocks/

    Best regards,

    Kristin


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    • Edited by Kristin Xie Tuesday, April 26, 2016 3:37 AM
    • Marked as answer by Mou_kolkata Tuesday, April 26, 2016 6:05 PM
    Tuesday, April 26, 2016 3:35 AM

All replies

  • Hi Mou_kolkata,

    >>suppose when i use task to read multiple csv file and if many thread try to write csv data into data table then dead lock may arise.......guide me how to handle it best way.

    Yes, based on your scenario, your case more likely arise a dead lock. Even though, I still suggest you use task to read multiple CSV files. It is more efficient. But we must deal with a dead lock issue.

     lock (_lock) 
      {
         DoWork();
      }
     

    For example, please look at the above code can be used to ensure that a job is done only once.

    MSDN actually has several good articles on this topic, one of which being:

    https://blogs.msdn.microsoft.com/mohamedg/2010/01/29/how-to-use-locks-and-prevent-deadlocks/

    Best regards,

    Kristin


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    • Edited by Kristin Xie Tuesday, April 26, 2016 3:37 AM
    • Marked as answer by Mou_kolkata Tuesday, April 26, 2016 6:05 PM
    Tuesday, April 26, 2016 3:35 AM
  • If the files are all on one drive then reading multiple files at once will slow things down.

    This is because disk IO is quite an expensive task and it will often be that which is your bottleneck.

    I would read the files and process them one by one.

    Take a look at the code here:

    http://stackoverflow.com/questions/23444860/process-list-of-files-asynchronously-using-async-and-await-in-c-sharp-console-ap

    public class MyClass
    {
        private int filesRead = 0;
    
        public void Go()
        {
            GoAsync().Wait();
        }
    
        private async Task GoAsync()
        {
            string[] fileSystemEntries = Directory.GetFileSystemEntries(@"Path\To\Files");
    
            Console.WriteLine("Starting to read from files! Count: {0}", fileSystemEntries.Length);
    
            var tasks = fileSystemEntries.OrderBy(s => s).Select(
                fileName => DoStuffAsync(fileName));
            await Task.WhenAll(tasks.ToArray());
    
            Console.WriteLine("Finish! Read {0} file(s).", filesRead);
        }
    
        private async Task DoStuffAsync(string filePath)
        {
            string fileName = Path.GetFileName(filePath);
            using (var reader = new StreamReader(filePath))
            {
                string firstLineOfFile = 
                    await reader.ReadLineAsync().ConfigureAwait(false);
                Console.WriteLine("[{0}] {1}: {2}", Thread.CurrentThread.ManagedThreadId, fileName, firstLineOfFile);
                Interlocked.Increment(ref filesRead);
            }
        }
    }


    Hope that helps.

    Technet articles: WPF: Layout Lab; All my Technet Articles

    • Marked as answer by Mou_kolkata Tuesday, April 26, 2016 6:05 PM
    Tuesday, April 26, 2016 9:56 AM
  • why people write this

    await Task.WhenAll(tasks.ToArray());
    tasks.ToArray() with WhenAll() ?

    what is the meaning of this line await reader.ReadLineAsync().ConfigureAwait(false);

    why await and ConfigureAwait both use ?

    what ConfigureAwait does ?

    please share the knowledge. thanks

    • Edited by Mou_kolkata Tuesday, April 26, 2016 6:08 PM
    Tuesday, April 26, 2016 6:07 PM
  • ReadLineAsync is an async friendly method.

    await is part of the async - await functionality.

    If you're unfamiliar then I recommend reading up on it.

    In a method marked as async you can await tasks. What happens is when the code with that await is hit the async task goes off and does it's stuff on another thread.

    Control returns from the method you're in.

    Time passes....

    The result of the async call eventually returns and the code continues on the line after the await.

    It's rather like an in line callback.

    BUT this is a pretty complicated subject and you should read up on it. It is entirely possible not to end up with another thread involved. Async is still useful even in that case in that you can put a sort of a sleep in which doesn't block the thread:

    await Task.Delay(3000);

    Start your reading here:

    https://msdn.microsoft.com/en-us/magazine/jj991977.aspx?f=255&MSPPError=-2147217396


    Hope that helps.

    Technet articles: WPF: Layout Lab; All my Technet Articles

    Tuesday, April 26, 2016 6:52 PM