none
Correct way to loop thru directories and get counts of files/folders within the root directory?

    Question

  • My attempt is the following:

    I have a small application that is suppose to access a directory(already works) and within that directory loop thru all the subfolders and files and return counts per folder, this is for admins to gather weekly file creation totals to report on.

    Example of the folder structure currently in place:

    Issues\03-02-2017\System1\....
    Issues\03-02-2017\System2\...

    What id like to get when the admins process the folder is a grand total for everything processed like so

    20170403160402_Username-Slowness.png Username-Slowness 03/02/2017 4:06:11 PM
    20170403160402_Username-Error.png Username-Error 03/02/2017 2:22:02 PM
    System1
    Errors: 1

    System2
    Slowness: 1

    Because this process would be executed by the admin at a time of their liking, so the output file seen above would need to be stored in each folder that was processed so that when they process again, if the file exists i read the contents of the file and process anything that was not processed the last run and so on. The issue im having is that as it processes a folder, the file that is created is accumulating totals from the previous folder and adding to the next and so on, by the last folder, the file contains the grand totals for everything, but then you would need to manually dig thru each folder to find that last file.. i want the files in each folder to ONLY contain their totals.. AND if possible, have 1 other file that contains the grand totals ONLY. 
    So the above file example would be what i need / want in each folder.. and then for the final file only this information

    First folder Processed    --  Last Folder Processed
    03/02/2017   -    xx/xx/xxxx

    Errors: 1
    Slowness: 1

    Does this make sense?

    Or if there is a better more efficient way please let me know.. im stuck on foreach loop that already creates the above, im just not getting the expected outputs.. I dont have the code with me now to post what ive got in place, but can post tonight when i get home.. 


    • Edited by Cubangt Tuesday, April 4, 2017 2:22 PM
    Tuesday, April 4, 2017 1:31 PM

All replies

  • Could you post your code?

    ~~Bonnie DeWitt [C# MVP]

    http://geek-goddess-bonnie.blogspot.com

    Tuesday, April 4, 2017 2:02 PM
    Moderator
  • Break this problem up into smaller chunks because you're talking about quite a bit of unrelated functionality that isn't all going to fit into a single foreach. Firstly I'd say the whole concept of remembering the last run is a waste of time. Unless you're talking about 1000s of folders and files you're going to spend more time detecting new stuff then simply recalculating the values. However given that you seem to be enumerating logs you could probably not bother reprocessing previous days (unless additional logs can get there later).

    As for counting the files in a folder you can do that using Directory.EnumerateFiles. Use the overload that accepts the search options to recurse all the subfolders in each root (date) folder. This gives you the totals for that day. Caveat, if there are any files you don't have permission to you'll get an unauthorized exception so make sure you handle that exception if it is a possibility.

    //Track the grand totals
    var totalSlowness = 0;
    var totalErrors = 0;
    
    //Sample code
    foreach (var rootDirectory in Directory.EnumerateDirectories(@"C:\Temp"))
    {
       //Get files in directory
       var childFiles = Directory.EnumerateFiles(rootDirectory, SearchOptions.AllDirectories);
    
       //If you just want the total then...
       var totalFiles = childFiles.Count();
    
       //Otherwise if you need to look at the filename to "categorize" the files then enumerate them
       var countSlowness = 0;
       var countErrors = 0;
    
       foreach (var file in childFiles)
       {
           //Total files (errors)
           ++countErrors;
    
           //Only slowness
           if (file.IndexOf("-Slowness") >= 0)
              ++countSlowness;
       };
    
       //Update totals
       totalSlowness += countSlowness;
       totalErrors += countErrors;
       //Write out the totals to a file in the current directory
    };
    
    //Done with all folders so you can write out totals now

    I honestly cannot tell where you're coming up with your Error and Slowness counts. I thought you might be basing it off the filename but I cannot see Error in the filenames. But if you want to distinguish the counts based upon the filename then you'll need to loop through the returned files and do some pattern matching (IndexOf, Regex, etc) to identify each "type" of file.

    Michael Taylor
    http://www.michaeltaylorp3.net

    Tuesday, April 4, 2017 2:14 PM
    Moderator
  • Ill post what i already have tonight, the counts by type are coming from the file names, as the user has the ability to submit based on the issue, Slowness, Error, Kickout.

    The file name is built from the datetime stamp down to the seconds, username and issue type:

    20170403160402_Username-Slowness.png

    There are chances that as a admin, i click to process right now 9:30am and 10 minuntes later someone submits an issue, tomorrow when i process again i would want to pick up that new submission so that i have updated daily totals.

    I am open to any and all suggestions.. the end goal is to gather daily totals reported by system and issue type.. we shouldnt be talking thousands of file, but def hundreds per week..

    Tuesday, April 4, 2017 2:27 PM
  • The code I posted I believe will work correctly. For other counts (Error, Kickout, etc) then you'll need to track the additional counts. If you start having a lot of counts then you might consider creating a simple class to manage the list of counts.

    public class LogCounts
    {
       public int Errors { get; set; }
       public int Slowness { get; set; }
    
       public int Other { get; set; }

    public void Add ( LogCounts counts )
    {
    Errors += counts.Errors;
    Slowness += counts.Slowness;
    Other += counts.Other;
    } }

    //Helper method
    LogCounts GetLogCounts ( string directory )
    {
       var childFiles = Directory.EnumerateFiles(directory, SearchOptions.AllDirectories);
    
       var counts = new LogCounts();
    
       foreach (var file in childFiles)
       {
          //Look for the file types that you care about
          if (file.IndexOf("-Slowness") >= 0)
             ++counts.Slowness;
          else if (file.IndexOf("-Error") >= 0)
             ++counts.Errors;
          ...
          else
             ++counts.Other;
       };
    
       //Write out the totals for this directory
       ...
    
       return counts;
    }
    
    //Main method
    var totals = new LogCounts();
    
    foreach (var directory in Directory.EnumerateDirectories(@"C:\Temp"))
    {
       var counts = GetLogCounts(directory);
    
       //Add to total
       totals.Add(counts);
    };
    
    //Write out the totals for the entire structure
    ...

    Alternatively, if the filenames are consistent then you can switch to a dictionary that tracks the file category and counts. This makes it easier if they add new "categories" later as you don't need to update the code.

    static void Main ( string[] args )
    {
        //Main method
        var totals = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
    
        foreach (var directory in Directory.EnumerateDirectories(@"C:\Temp"))
        {
            var counts = GetLogCounts(directory);
    
            //Merge the counts
            MergeCounts(totals, counts);
        };            
    
        //Write out the totals for the entire structure            
    }
    
    //Helper methods
    static void MergeCounts ( IDictionary<string, int> target, IDictionary<string, int> source )
    {
        //Could probably do this via LINQ as well
        foreach (var pair in source)
        {
            if (target.ContainsKey(pair.Key))
                target[pair.Key] += source[pair.Key];
            else
                target[pair.Key] = source[pair.Key];
        };
    }
    
    static string GetFileCategory ( string filename )
    {
        //Put your logic here that identifies the category given the filename
        return "Error";
    }
    
    static Dictionary<string, int> GetLogCounts ( string directory )
    {
        var childFiles = Directory.EnumerateFiles(directory, "*.*", SearchOption.AllDirectories);
    
        var counts = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
    
        foreach (var file in childFiles)
        {
            //Look for the file types that you care about
            var category = GetFileCategory(file);
            if (counts.ContainsKey(category))
                ++counts[category];
        };
    
        //Write out the totals for this directory
        return counts;
    }

    I would not bother trying to figure out which files were added after you've already processed them. Unless performance is really slow (which I doubt for a utility like this) then just recalculate the information and regenerate the file each time it is run. You could, of course, optimize it to not look at directory older than a certain date (say a week) so that it doesn't get slower the more directories you get if you're not also archiving these directories somewhere.

    Tuesday, April 4, 2017 2:52 PM
    Moderator
  • so i should be able to process and reprocess based on current week right? 

    If i could end up with totals for week periods that would be a plus.. even better break down by system..

    4/2/2017 - 4/8/2017 = 80 Issues

    System1
    Slowness = 24
    Kickout = 8
    Error = 32
    ================
    System2
    Slowness = 5
    Kickout = 1
    Error = 10

    Sucks not having my code available at work.. ill post and follow up tonight when i get working on the project again.



    • Edited by Cubangt Tuesday, April 4, 2017 4:34 PM
    Tuesday, April 4, 2017 4:31 PM
  • "so i should be able to process and reprocess based on current week right?"

    How you break up the root date folders is completely up to you. To avoid reprocessing you should consider moving the date directories to Archive (or similar) folder over, say, a week. So "archiving" becomes separate from generating the counts. The archive process moves everything over a week somewhere else and your counting process simply counts the current directories. Makes it easy to change things later as well.

    Tuesday, April 4, 2017 5:07 PM
    Moderator
  • Cool, thank you so much for the suggestions and code samples.. 

    I will post back later.

    Tuesday, April 4, 2017 5:42 PM
  • Something i was thinking about over lunch.. and not to throw the above out the door.. is potentially next release / later version of this to store the raw data into a local db, so that any counting is done upon request of the application.. so in other words, for every folder and file found, log an entry with the break down 

    date   system  issue

    Then i could query the table for everything for the week needed and just return the counts by system and issue.. 

    Thoughts??

    Tuesday, April 4, 2017 6:19 PM
  • I would certainly prefer the DB route if at all possible. You can use SSIS or similar to import the files and then do aggregate queries for whatever kind of data you care about. It would also be faster than file scanning. Of course you're still going to have to ETL that data but that could be a nightly job if necessary.
    Tuesday, April 4, 2017 7:00 PM
    Moderator
  • Ill make a backup of the existing project and look at integrating the DB into it and go that route, but in the mean time, ill adjust the logic to the suggestions above for the current build as this needs to be at least a form of gathering the data without manually looking and counting each folder and file.

    Right now i am beta testing this application with about 4 people. There is a potential of deploying this out to little over 200 others within the company, so before opening the flood gates to that many submissions.. i want something that will be able to at least provide decent information and improve from there based on feedback.

    Tomorrow im doing a demo for a few key employees, if they like the functionality and ability that the application has, then the admin portion will def be priority #1 to get useful data, easily. 

    Tuesday, April 4, 2017 7:22 PM
  • This is the logic I currently have BEFORE any changes suggested above.

                StringBuilder sbIssues = new StringBuilder();
                string pfile = string.Empty;
                string currentDir = string.Empty;
    
                //DATE FORMAT 20170119172113 = 14(LENGTH)
                if (Directory.Exists(dirPath))
                {
                    DirectoryInfo di = new DirectoryInfo(dirPath);
                    int numPro = 0;
                    int ecnt = 0;
                    int kcnt = 0;
                    int scnt = 0;
                    
                    foreach (var dir in di.EnumerateDirectories())
                    {
                        //pfile = dirPath + dir + @"\processed.txt";
                        currentDir = dirPath + dir;
                        DirectoryInfo diSub = new DirectoryInfo(currentDir);
                        DirectoryInfo[] sdi = diSub.GetDirectories();
    
                        foreach (var d in sdi)
                        {
                            pfile = dirPath + dir + @"\" + d + @"\processed.txt";
    
                            if (File.Exists(pfile))
                            {
                                continue;
                            }
                            else
                            {
                                DirectoryInfo subFiles = new DirectoryInfo(currentDir + @"\" + d);
                                foreach (var fil in subFiles.EnumerateFiles())
                                {
                                    int i = fil.Name.IndexOf("_");
                                    string fn = fil.Name.Substring(i - 14, 14);
                                    string formatstring = "yyyyMMddHHmmss";
                                    DateTime dd = DateTime.ParseExact(fn, formatstring, null);
                                    int iind = fil.Name.IndexOf(".");
                                    string iType = fil.Name.Substring(i + 1, iind - (i + 1));
                                    sbIssues.Append(fil.Name + " " + iType + " " + dd.ToString());
                                    sbIssues.AppendLine();
                                    if (iType.Contains("Error"))
                                    {
                                        ecnt++;
                                    }
                                    if (iType.Contains("Slowness"))
                                    {
                                        scnt++;
                                    }
                                    if (iType.Contains("Kickout"))
                                    {
                                        kcnt++;
                                    }
                                }
                                sbIssues.Append(d);
                                sbIssues.AppendLine();
                                sbIssues.Append("Errors: " + ecnt);
                                sbIssues.AppendLine();
                                sbIssues.Append("Slowness: " + scnt);
                                sbIssues.AppendLine();
                                sbIssues.Append("Kickouts: " + kcnt);
                                sbIssues.AppendLine();
                                using (StreamWriter sw = File.CreateText(pfile))
                                {
                                    sw.Write(sbIssues.ToString());
                                }
                                numPro++;
                            }
                        }
                    }
                    if (numPro == 0)
                    {
                        MessageBox.Show("Unprocessed folders do not exist!", "Test File Values", MessageBoxButtons.OK);
                    }
                    else
                    {
                        ReadProcessedFile(currentDir, pfile);
                        MessageBox.Show(sbIssues.ToString(), "Issues Processed", MessageBoxButtons.OK);
                    }
                }
                else
                {
                    MessageBox.Show("Sorry, directory: " + str + fldDate + " does not exist!", "Get ScreenShot Error", MessageBoxButtons.OK);
                }

    I'm just getting to this tonight, so may be tomorrow before I can actually try the suggestions above.. ive actually been trying to setup the database but having issues with that as well, I created the sdf file, created the DB within it and 1 table to store the data, manually inserted 1 records and then wrote connection to try to programmatically insert a record, but keeps saying that the table doesn't exist.. so it clearly connects to the db, but cant see the table..


    • Edited by Cubangt Wednesday, April 5, 2017 3:23 AM
    Wednesday, April 5, 2017 3:20 AM