locked
Sorting numbers in a csv file. RRS feed

  • Question

  • Hi,

    I am trying to sort a small csv file that looks like this:

    Head1,Head2

    A1,23

    A2,56

    A78,3

    B3,78

    D1,1

    ........Using the code below (found online), the sorting places number 3 before number 23.

    I understand this is because we are dealing with strings.

    So, how can I modify the code below to correct this.

    Thank you for your assistance.

    regards

    Francesco C

     static void Main(string[] args)
                {
                // Create the IEnumerable data source  
                string[] lines = System.IO.File.ReadAllLines("F:\\sheet1.csv");
    
                // Create the query. Put field 2 first, then  
                // reverse and combine fields 0 and 1 from the old field  
                IEnumerable<string> query =from line in lines
                    let x = line.Split(',')
                    orderby x[1]
                   select x[0]+","+(x[1]);
    
                // Execute the query and write out the new file. Note that WriteAllLines  
                // takes a string[], so ToArray is called on the query.  
                System.IO.File.WriteAllLines("F:\\sheet2.csv",query.ToArray());
    
                Console.WriteLine("sheet2.csv written to disk. Press any key to exit");
                Console.ReadKey();
                }

    Saturday, January 11, 2020 3:47 PM

Answers

  • instead of

    orderby x[1]

    use

    orderby int.Parse(x[1])

    This converts the string into integer and sorts by the value of the integer.

    It will only work if all the lines contains values that are integers, you will get an error if a line contains something else in position [1]. In particular, be careful with the header line ("Head2") -- you will want to skip this line and leave it at the top of the file, not sort it.

    from line in lines.Skip(1) ...



    Saturday, January 11, 2020 4:34 PM

All replies

  • instead of

    orderby x[1]

    use

    orderby int.Parse(x[1])

    This converts the string into integer and sorts by the value of the integer.

    It will only work if all the lines contains values that are integers, you will get an error if a line contains something else in position [1]. In particular, be careful with the header line ("Head2") -- you will want to skip this line and leave it at the top of the file, not sort it.

    from line in lines.Skip(1) ...



    Saturday, January 11, 2020 4:34 PM

  • I am trying to sort a small csv file that looks like this:

    Head1,Head2

    A1,23

    A2,56

    A78,3

    B3,78

    D1,1

    ........Using the code below (found online), the sorting places number 3 before number 23.

    I understand this is because we are dealing with strings.

    So, how can I modify the code below to correct this.


    When asking about sorting you should specify *exactly* how you want the file 
    sorted.

    What column(s) or fields contain the key on which the file is to be sorted?
    I assume you are trying to sort on the second "field".

    Is there a major and a minor key, or just a major key? i.e. - Does the order
    of lines matter in the case of equal keys?

    Is this a sort which includes alpha characters or just numeric?

    Is it to be in ascending order or descending?

    Do you want it sorted in string order or true numeric order?

    >the sorting places number 3 before number 23.

    Why is that an issue? That is the normal sequence of ordered numerics.
    The value 3 is less than the value 23 so of course it will usually
    occur in that order when sorting numbers.

    By contrast, a string sort will usually compare character by character from left 
    to right until a mismatch is found at which point the order has been determined.
    The ordinal value of those characters determines which string is greater or
    lesser than the other. So when comparing "3" and "21" the ascending order
    will be "21" followed by "3", as "2" is less than "3".

    - Wayne


    • Edited by WayneAKing Saturday, January 11, 2020 5:35 PM
    Saturday, January 11, 2020 5:15 PM
  • Hello,

    Anytime reading a file always except the unexpected e.g. blank lines, no delimiter, data not in the expected format etc. For this reason stay away from the easy solution which don't check for the unexpected conditions.

    Today's file may be perfect while tomorrows file may be imperfect. I like short lines of code but not at the expense of code blowing up unexpectedly. 

    Container for remembering lines in correct types along with a read-only property use to write data out to the new file along with the ability to sort on the second column.

    Note in the code below you can include or exclude the first row assuming the first line contains header information.

    public class LineItem
    {
        public string Column1 { get; set; }
        public int Column2 { get; set; }
        public string Line => $"{Column1},{Column2}";
    }

    Class to read/write 

    public class FileOperations
    {
        /// <summary>
        /// Write contents to another file order by second column
        /// </summary>
        /// <param name="fileName">File to read</param>
        /// <param name="newFileName">File to write to</param>
        /// <param name="includeHeader">true to include header, false to exclude header</param>
        public static void ReadWrite(string fileName, string newFileName, bool includeHeader = true)
        {
            var lineItemList = new List<LineItem>();
            int counter = 1;
            string header = "";
    
            using (var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous | FileOptions.SequentialScan))
            {
                using (var reader = new StreamReader(stream))
                {
                    string line;
    
                    while ((line = reader.ReadLine()) != null)
                    {
                        if (string.IsNullOrWhiteSpace(line) || counter == 1)
                        {
                            counter += 1;
                            if (counter == 2)
                            {
                                header = line;
                            }
                            continue;
                        }
                      
    
                        var lineParts = line.Split(',');
                        if (lineParts.Length != 2)
                        {
                            continue;
                        }
    
                        int.TryParse(lineParts[1], out var value);
    
                        lineItemList.Add(new LineItem()
                        {
                            Column1 = lineParts[0],
                            Column2 = value
                        });
                    }
                }
            }
    
            File.WriteAllLines(newFileName, lineItemList.OrderBy(item => item.Column2).Select(item => item.Line).ToArray());
    
            if (includeHeader)
            {
                var currentLines = File.ReadAllLines(newFileName).ToList();
                currentLines.Insert(0, header);
                File.WriteAllLines(newFileName, currentLines.ToArray());
            }
        }
    }
    

    Usage

    var readFile = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Sheet1.csv");
    var newFile = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Sheet2.csv");
    if (File.Exists(readFile))
    {
        FileOperations.ReadWrite(readFile, newFile);
    }           

     


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Saturday, January 11, 2020 5:34 PM
  • Thank you Alaberto, it works fine.

    Regards

    Francesco C

    Saturday, January 11, 2020 6:38 PM