none
String.CompareTo sort order of apostrophe

    Question

  • hi All,

    I'm having problems with the "apostrophe" and the sort order as evidenced from the CompareTo method.

    Below is my code which produces the following values as determined by Visual Studio 2010 on Win7:

    one = 1
    two = 0
    three = 1
    four = 1
    five = 1
    six = -1
    seven = -1

    private void test()
            {
                string
                    word = "we're";

                int
                    one = word.CompareTo( "we" )            // greater than
                    ,two = word.CompareTo( "we're" )        // equal
                    ,three = word.CompareTo( "weak" )       // greater than
                    ,four = word.CompareTo("wepen")         // greater than
                    ,five = word.CompareTo("weqen")         // greater than
                    ,six = word.CompareTo("weren")          // less than
                    ,seven = word.CompareTo("wesen")
                    ;

                return;
            }

    I've confirmed that the value of the apostrophe is the unicode \u2019.  How come it is coming up as less than 'r' and 's'??

    Thanks, billwa992

    Tuesday, August 30, 2011 1:43 AM

Answers

  • Billwa,

    How the comparisons work is dependent on the CultureInfo.  Whatever your CurrentCulture is has defined in your case that a string comparison is to ignore an apostrophe.  That's why comparing "we're" to "weren" is really like comparing "were" to "weren", and "were" is less than "weren".


    Tom Overton
    • Marked as answer by billwa992 Tuesday, August 30, 2011 4:44 PM
    Tuesday, August 30, 2011 2:23 AM

All replies

  • Billwa,

    How the comparisons work is dependent on the CultureInfo.  Whatever your CurrentCulture is has defined in your case that a string comparison is to ignore an apostrophe.  That's why comparing "we're" to "weren" is really like comparing "were" to "weren", and "were" is less than "weren".


    Tom Overton
    • Marked as answer by billwa992 Tuesday, August 30, 2011 4:44 PM
    Tuesday, August 30, 2011 2:23 AM
  • If it's just ignoring apostorophes how come "we're".CompareTo("we're") is 0, and "we're".CompareTo("were") is 1? (By the way it's really fast to do this testing in PowerShell). As for why it behaves this way? Maybe Michael Kaplan (http://blogs.msdn.com/b/michkap/) has a post on it I haven't been able to find.
    Tuesday, August 30, 2011 2:50 AM
  • Yeah jader, I spoke too soon. I was using the String.Compare method but didn't notice I had passed it a CompareOptions of IgnoreSymbols but this is not the default. 
    Tom Overton
    Tuesday, August 30, 2011 3:04 AM
  • my currentculture is "en-US", so I don't believe that is the problem.  Is there a specific parameter of the cultureinfo that affects the string compare of unicode strings??

    billwa992

    Tuesday, August 30, 2011 3:53 AM
  • Billwa,

    The comparison is done on ASCII. The ASCII value of apostophe is 39. "we're" represenetation in ASCII is

    w        e       '        r        e   
    119    101    39     114    101

    Hence, apostrope is less than r & s.


    Please mark this post as answer if it solved your problem. Happy Programming!
    Tuesday, August 30, 2011 4:09 AM
  • Why ASCII, these are unicode strings and thus unicode char set with aprostrophe valued at \u2019.

    And that does not explain why "we're" is greater than "wepen" and "weqen" and less than "weren" and "wesen".


    Tuesday, August 30, 2011 4:23 AM
  • If it's just ignoring apostorophes how come "we're".CompareTo("we're") is 0, and "we're".CompareTo("were") is 1? (By the way it's really fast to do this testing in PowerShell). As for why it behaves this way? Maybe Michael Kaplan (http://blogs.msdn.com/b/michkap/) has a post on it I haven't been able to find.

    Elements of the string don't have the same weight when determining sort-order. If apostrophes have less weight than letters, they are only important to sort strings which would otherwise be considered equal.
    Tuesday, August 30, 2011 10:05 AM
  • Thanks to all who have replied.  I figured out the answer.

    First, I had a bug in my test code, using ''' insetad of '\u2019' in my strings when running my tests.  Changing to '\u2019' then allowed me to obtain the answer.

    String.CompareOrdinal in c# provides the same sort order as wcscmp does in c++ (but only when all apostrophes are entered as '\u2019').

    The  key is to understand the CompareOptions.  CompareOptions.None was indeed ignoring the apostrophe, except when all other letters compared equal.  I'm not sure I agree with the logic that this makes sense, but then, what the hay.


    I'm giving Tom the credit for this answer, since his reply put me on the right track.
    Tuesday, August 30, 2011 4:43 PM