none
Sorting a List<string> based on unusual circumstances RRS feed

  • Question

  • I have a program that reads in a bunch of pdf files from different folders and merges them together to create one pdf file of multiple pages, and the output file name is the folder's name. However, I need these pdf files to be sorted in a certain way, and as of right now it want's to sort them however they are sorted in the folder. I don't need them alphabetically, instead I need them in this order - 

    1. Cover Page
    2. S-Pages
    3. E-Pages
    4. L-Pages
    5. C-Pages
    6. D-Pages
    7. (Any others)

    Here is a sample of what my pdf strings in the List<string> look like. The pages are at the end, such as "T-0012_remodel 2002 c3_C03.pdf" would be a C page.

    

    Thursday, August 16, 2018 2:01 PM

Answers

  • You can pass a Comparer object into the List.Sort method.

    This Comparer can be a class that implements IComparer and provides custom sort logic.

    For example:

    class PdfFileComparer : IComparer<string>
            {
                public int Compare(string x, string y)
                {
                    string file1 = Path.GetFileNameWithoutExtension(x);
                    string file2 = Path.GetFileNameWithoutExtension(y);
                    // Get last part of filenames (after the last '_')
                    string lastBit1 = file1.Substring(file1.LastIndexOf('_') + 1);
                    string lastBit2 = file2.Substring(file2.LastIndexOf('_') + 1);
    
                    // Ensure COVER file always first
                    if (string.Compare(lastBit1, "COVER", StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        return -1;
                    }
                    if (string.Compare(lastBit2, "COVER", StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        return 1;
                    }
    
                    // Else just sort by the last part alphabetically (ignoring case)
                    return String.Compare(lastBit1, lastBit2, StringComparison.CurrentCultureIgnoreCase);
                }
            }


    Then, with your List of files, just call:

    // some dummy data
    List<string> files = new List<string>()
                {
                    @"c:\folderpath\T-0013_hvac 2002_2003_H01.pdf",
                    @"c:\folderpath\T-0013_remodel 2002 c3_c01.pdf",
                    @"c:\folderpath\T-0013_remodel 2002 c3_COVER.pdf",
                    @"c:\folderpath\T-0013_retrofit 1997_H1.pdf",
                };
    
    // Sort using my comparer class
    files.Sort(new PdfFileComparer());


    My example just sorts by COVER and then alphabetically by the last part, but hopefully you can see the logic there. Just change the Compare method to be as fancy as you like. It just has to return -1 if the first string x comes before y; return 1 if y is before x; or 0 if they are equal.

    • Edited by RJP1973 Thursday, August 16, 2018 2:46 PM
    • Marked as answer by JxkeZ Thursday, August 16, 2018 8:00 PM
    Thursday, August 16, 2018 2:43 PM

All replies

  • Make a temporary list where you Extract the 7 last characters in each string and add them at the beginning of that string.

    Then you can simply print that list out alphabetically...

    Thursday, August 16, 2018 2:42 PM
  • You can pass a Comparer object into the List.Sort method.

    This Comparer can be a class that implements IComparer and provides custom sort logic.

    For example:

    class PdfFileComparer : IComparer<string>
            {
                public int Compare(string x, string y)
                {
                    string file1 = Path.GetFileNameWithoutExtension(x);
                    string file2 = Path.GetFileNameWithoutExtension(y);
                    // Get last part of filenames (after the last '_')
                    string lastBit1 = file1.Substring(file1.LastIndexOf('_') + 1);
                    string lastBit2 = file2.Substring(file2.LastIndexOf('_') + 1);
    
                    // Ensure COVER file always first
                    if (string.Compare(lastBit1, "COVER", StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        return -1;
                    }
                    if (string.Compare(lastBit2, "COVER", StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        return 1;
                    }
    
                    // Else just sort by the last part alphabetically (ignoring case)
                    return String.Compare(lastBit1, lastBit2, StringComparison.CurrentCultureIgnoreCase);
                }
            }


    Then, with your List of files, just call:

    // some dummy data
    List<string> files = new List<string>()
                {
                    @"c:\folderpath\T-0013_hvac 2002_2003_H01.pdf",
                    @"c:\folderpath\T-0013_remodel 2002 c3_c01.pdf",
                    @"c:\folderpath\T-0013_remodel 2002 c3_COVER.pdf",
                    @"c:\folderpath\T-0013_retrofit 1997_H1.pdf",
                };
    
    // Sort using my comparer class
    files.Sort(new PdfFileComparer());


    My example just sorts by COVER and then alphabetically by the last part, but hopefully you can see the logic there. Just change the Compare method to be as fancy as you like. It just has to return -1 if the first string x comes before y; return 1 if y is before x; or 0 if they are equal.

    • Edited by RJP1973 Thursday, August 16, 2018 2:46 PM
    • Marked as answer by JxkeZ Thursday, August 16, 2018 8:00 PM
    Thursday, August 16, 2018 2:43 PM
  • Make a temporary list where you Extract the 7 last characters in each string and add them at the beginning of that string.

    Then you can simply print that list out alphabetically...

    The order I need isn't in alphabetical order though. It's 

    1. Cover Page
    2. S-Pages
    3. E-Pages
    4. L-Pages
    5. C-Pages
    6. D-Pages
    7. (Any others)

    Thursday, August 16, 2018 2:44 PM
  • Ok, alphabetically descending
    Thursday, August 16, 2018 2:49 PM
  • Ok, alphabetically descending
    C and D are in order at the bottom, S is the second, a specific COVER.pdf is always first, E is second, it's completely out of wack. Alphabetical wont work
    Thursday, August 16, 2018 2:50 PM
  • Some people need to learn the alphabet!

    Please see my code example above.

    I did sort pseudo-alphabetical but I think you can see by the way I ensured the COVER file comes first that you can change the logic in the Compare function to be as arbitrary as you like.

    (Hint: For ease of use/maintainability you could create a list or dictionary of letters and priorities inside the comparer class. Then for each file extract out the last parts of the file name as I did in my code, but then use this dictionary to look up the priorities for the letters in each file part and return -1, 0 or 1 based on these priorities).
    • Edited by RJP1973 Thursday, August 16, 2018 2:56 PM
    Thursday, August 16, 2018 2:53 PM
  • Absolutely, nice coding with the substring there.

    Can also be done using Linq with "Where" and "Select" I guess.

    • Edited by ThisNewbie Thursday, August 16, 2018 3:25 PM
    Thursday, August 16, 2018 3:06 PM
  • You have a string parsing issue here, not really an ordering issue. Given just the folder you provided it appears that the "sorting" token is the last token in the filename (e.g. S04, L02, S1, H3B). You are going to have to be able to come up with some rules around file naming otherwise you aren't going to be able to sort. If you cannot define these rules and ensure they are followed for all folders then you aren't going to be able to sort the documents reliably.

    Assuming the last token is the one you want then strip it off the end of the filename. It is unclear in your question but I assume that S03 should precede S04 and S1 follows both of those. There are quite a few different ways to do this. For performance reasons I'd probably lean toward grouping by the token first and then ordering from that. Here's one approach.

    class Program
    {
        static void Main ( string[] args )
        {
            var files = new[]
            {
                "abc_H01.pdf",
                "abc_H02.pdf",
                "def c3_c01.pdf",
                "def c3_c02.pdf",
                "def c3_cover.pdf",
                "def c3_s01.pdf",
                "def c3_s02.pdf",
                "ghi 1997_h1.pdf",
                "ghi 1997_s1.pdf",
                "Notoken.pdf"
            };
    
            foreach (var file in files.OrderByFileNameToken())
                Console.WriteLine(file);
        }
    }
    
    static class FileExtensions
    { 
        public static IEnumerable<string> OrderByFileNameToken ( this IEnumerable<string> source )
        {
            //Group by the calculated group name first and then the filename
            return from v in source
                    orderby GetFileNameGroup(v), v
                    select v;
        }
    
        //Helper method 
        static int GetFileNameGroup ( string value )
        {
            //Group the names by the last token, if any
            //Could use an RE here but I think simple string parsing is cleaner for this
            value = Path.GetFileNameWithoutExtension(value);
            var index = value.LastIndexOf('_');
            if (index >= 0)
            {
                var token = value.Substring(index + 1);
    
                //Figure out the category
                var category = (String.Compare(token, "Cover", true) == 0) ? "Cover" : token[0].ToString();
    
                //Determine the group 
                var group = Array.FindIndex(s_grouping, m => String.Compare(m, category, true) == 0);
                if (group >= 0)
                    return group;
            };
    
            return 1000;
        }
    
        static readonly string[] s_grouping = new[] { "Cover", "S", "E", "L", "C", "D", };
    }


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, August 16, 2018 3:29 PM
    Moderator
  • definitely on the right path. How would you go about creating a dictionary with letter priorities? I think that may be the only way to go. I'm clueless as to how to do that. I wish these files could just be alphabetical, but because these are all specific drawings, they have to be in that order. 

    Sorry for all the noob-iness, I haven't even been programming C# for 3 full months.

    Thursday, August 16, 2018 4:36 PM