none
What is the fastest way to enumerate large number of files in a folder? RRS feed

  • Question


  • I am looking for  a way to enumerate large number of files (e.g. 5,00,000 files) in smallest possible time.

    I have already tried GetFiles() but it takes lot of time.

    What is the fastest way to enumerate a large number of files in .NET Framework?

    I am fine with Native code also if I can use it with PInvoke.

    Any suggestions?

    FYI: I am using C#.


    Himanshu chhaya - Windows CE, Mobile and Phone 7


    Thursday, December 24, 2015 10:41 AM

Answers

All replies

  • What version of .Net are you using?

    .Net 4 introduced Directory.EnumerateFiles(), which returns an IEnumerable<string> rather than a string[].

    This is more useful as you can then loop around the IEnumerable, processing files as you go (even do this asynchronously and display them as they arrive to maintain user-interactivity). 

    Of course, the total time it takes for you to loop around all files will be the same, but at least you only require a single loop (rather than using GetFiles(), which must loop around itself to create the string array in the first place).

    Plus, much less memory overhead.



    • Edited by RJP1973 Thursday, December 24, 2015 11:05 AM clairified
    • Proposed as answer by Cor Ligthert Thursday, December 24, 2015 11:27 AM
    Thursday, December 24, 2015 11:02 AM
  • I am using .NET Framework 2.0, so unfortunately I cannot use that. We need to support .NET Framework 2.0 because some of our users are not ready to upgrade and we have to accommodate all users.

    Any other ways?


    Himanshu chhaya - Windows CE, Mobile and Phone 7


    Thursday, December 24, 2015 11:38 AM
  • Hi Himanshu,

    Enumerate all files and folders from a specific root folder, either on a local drive or across a network.  I've broken the task down into two IEnumerable implementations as follows:

    You can build with .NET Framework 2.0. and the entire process stalls at the call to GetDirectories or GetFiles while it processes the folder.

     static void Main(string[] args)
            {
                IEnumerable<string> files = EnumeratePaths("D:\\");
    
                foreach(var file in files)
                {
                    Console.WriteLine(file);
                }
    
            }
    
          
            static IEnumerable<string> EnumeratePaths(string root)
            {
                if (root == null)
                    throw new ArgumentNullException("root");
                if (!Directory.Exists(root))
                    throw new ArgumentException("Invalid root path", "root");
    
                if (root.Length > 3)
                    root = Path.GetDirectoryName(root + "\\");
    
                Queue<string> queue = new Queue<string>();
                queue.Enqueue(root);
    
                while (queue.Count > 0)
                {
                    string curr = queue.Dequeue();
                    bool failed = false;
                    try
                    {
                        foreach (var path in Directory.GetDirectories(curr))
                            queue.Enqueue(path);
                    }
                    catch
                    {
                        failed = true;
                    }
                    if (!failed)
                        yield return curr;
                }
            }
    
            static IEnumerable<string> EnumerateFiles(string root)
            {
                var paths = EnumeratePaths(root);
                foreach (var nxt in paths)
                {
                    foreach (var filename in Directory.GetFiles(nxt))
                        yield return filename;
                }
            }

    Best regards,

    Kristin


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, December 28, 2015 8:40 AM
  • An alternative is to access the system’s data directly (“Master File Table (MFT)”). There is a series of articles that uses the DeviceIoControl system function. If you need it in C#, then see these researches:

    • Marked as answer by Himanshu Chhaya Wednesday, December 30, 2015 10:00 AM
    Monday, December 28, 2015 5:13 PM
  • Thanks for Reply.

    I have tested the MFT solution and results are satisfactory. It is taking approx 4 seconds to go through 500 Thousand records. (All files are in same directory.)

    The only problem is, MFT works only with NTFS system.

    How can I get similar performance with FAT system?


    Himanshu chhaya - Windows CE, Mobile and Phone 7

    Tuesday, December 29, 2015 11:19 AM
  • In case of FAT, you probably have to read the corresponding disk sectors and parse them according to Specification (https://msdn.microsoft.com/en-us/windows/hardware/gg463080.aspx) and articles.

    Examples are usually in C++. To read the sectors, seems that you can use CreateFile with a drive name like “\\\\.\\A:”, then read the data with ReadFile.

    See also: http://www.bing.com/search?q=FAT-32+in+C%23.


    • Edited by Viorel_MVP Tuesday, December 29, 2015 1:09 PM
    Tuesday, December 29, 2015 1:07 PM
  • Dear Viorel_,

    Thanks for reply.

    I will look into this and come back.


    Himanshu chhaya - Windows CE, Mobile and Phone 7

    Wednesday, December 30, 2015 10:00 AM