none
Enumerate filesystem fast without using recursion and without pumping up a List or array bubble RRS feed

  • Question

  • Hi there!

    Please bear with me I am an PowerShell Enthusiast with low C# knowlege.

    I like to enumerate the filesystem very robust and in the fastes way it could be.

    My goal is to be compatible to the .NET Core Framework so not to use Win-API.

    Since the .NET methods like System.IO.Directory.EnumDirectories / EnumFiles with the SearchOption.AllDirectories are breaking on every error (eg. Access denied)

    AND

    the Method GetDirectory() blows up a List.

    The recomended Way is to use a recursive Method with the SearchOption.TopDirectoryOnly.

    In todays large Datacenters read the filesystem recursivly can cause a call stack error.

    So searched the Internet VERY long and came back with the Queue (or Stack) method to enumerate non recurse.

    (Showed below)

    Nice ! But ....

    The Queue Method has the downside that the Queue is pumped up.

    Here I fear out of Memory situations cause even the Queue can be go too large. ...?

    - I Think, Access denied or other Errors MUST be reported so i Think that iEnumerable or an Object Stream cannot solve this.

    - Even to report Enumeration Progress is necessary.

    DO YOU KNOW AN FILESYSTEM ENUMERATION WITHOUT RECURSION OR TO FILL A LIST OR QUEUE ?

    Here is my code I have stitch together (runs in a Console Project):

            public static IEnumerable<string> DirectoryDownTheRabbitHole(string path, bool recurse)
            {
                int max = 0;
    
                IEnumerable<string> EnumeratedDir = Enumerable.Empty<string>();
                Queue<Exception> exceptions = new Queue<Exception>();
                Queue<string> pending = new Queue<string>();
    
                pending.Enqueue(path);
    
                while (pending.Count > 0)
                {
                    try
                    {
                        EnumeratedDir = Directory.EnumerateDirectories(pending.Dequeue(), @"*", SearchOption.TopDirectoryOnly);
                    }
                    catch (Exception e)
                    {
                        exceptions.Enqueue(e);
                        continue; // skip this directory
                    }
    
                    while (exceptions.Count > 0)
                    {
                        // TODO: switch on the throwing if PowerShell is consumer
                        //throw exceptions.Dequeue();
                        var nothing = exceptions.Dequeue();
                    }
    
                    if (EnumeratedDir != null)
                    {
                        foreach (string returnedDir in EnumeratedDir)
                        {
                            yield return returnedDir;
    
                            if (recurse)
                            {
                                pending.Enqueue(returnedDir);
                            }
                        }
                    }
    
                    if (pending.Count > max)
                    {
                        max = pending.Count;
                    }
                }
    
                Console.WriteLine(@"Queue max = " + max);
            }


    PowerShell Artikel, Buchtipps und kostenlose PowerShell Tutorials + E-Books
    auf der deutschsprachigen PowerShell Community

    Mein 21 Teiliger PowerShell Video Grundlehrgang
    Deutsche PowerShell Videos auf Youtube
    Folge mir auf:
    Twitter | Facebook | Google+




    Thursday, August 29, 2019 7:20 PM

Answers

  • In todays large Datacenters read the filesystem recursivly can cause a call stack error.

    No, this cannot be. The stack will only use one level for each level of subdirectories in your file system. You can accommodate thousands of levels without causing a stack overflow. Not even the largest datacenters will use thousands of levels of depth in a filesystem.

    The only thing that comes to mind is that you might have encountered an infinite loop because one of the subdirectories had a subfolder which is a symbolic link pointing to a parent directory. This will cause a stack overflow if you are using recursion. The remedy is to either not follow the links or, if you follow them, examine the path to verify that it doesn't match something that you have already enumerated.

    If that is not the case (your filesystem does not have any links) and you still get stack overflow errors, this indicates some kind of mistake in your code.

    • Proposed as answer by cheong00Editor Friday, August 30, 2019 1:25 AM
    • Marked as answer by Peter Kriegel Friday, August 30, 2019 7:46 AM
    Thursday, August 29, 2019 8:42 PM
    Moderator

All replies