locked
Array.sort fails to sort DBCS(double byte character set) Japanese characters. RRS feed

  • Question

  • Can any one please guide me on this topic, to sort Japanese characters.

    Right now I am using Array.sort but it is not sorting according shift-JIS table.

    Here is the code Snippet:

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Collections;
    using System.Globalization;

    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string[] aa = new string[]
                {
                    "☆",
                    "★",
                    "○",
                    "●",
                    "◎",
                    "◇",
                    "◆",
                    "□",
                    "■"
                };

                Console.WriteLine("----------------------------------------");
                Console.WriteLine("Before Sorting:");
                foreach (string s in aa)
                {
                    Console.WriteLine(s);
                }
                Console.WriteLine("*****************************************");
                Console.WriteLine("After Sorting:");          
                Array.Sort(aa);

                foreach (string s in aa)
                {
                    Console.WriteLine(s);
                }
                Console.WriteLine("----------------------------------------");
            }
        }
    }

    Output:









    expected according to Shift_JIS Table









    Thanks in advance.

     

     

    • Moved by edhickey Monday, February 21, 2011 8:45 PM (From:.NET 3.0/3.5 Windows Workflow Foundation)
    Monday, February 21, 2011 11:14 AM

Answers

  • Sorry for the delay, actually I’ve never used the Japanese code standard Shift_Jis. Here I’ve just fulfilled the sample, please directly copy this code snippet to your project.
    See the detailed explanations in the code snippet. Thanks.
    Looking forward to hearing from you.
      static class Program
      {
        static void Main(string[] args)
        {
          //Notice here I have disorganized the sequence.
          string[] aa = new string[]
          {
            "○",
            "★",
            "☆",
            "●",
            "◎",
            "◇",
            "□",
            "◆",
            "■"
          };
    
    
          Console.WriteLine("----------------------------------------");
          Console.WriteLine("Before Sorting:");
          foreach (string s in aa)
          {
            Console.WriteLine(s);
          }
          Console.WriteLine("*****************************************");
          Console.WriteLine("After Sorting:");
          Array.Sort(aa, new CustomComparer());
    
          // The result table title.
          Console.WriteLine("Str  SJIS  UTF16");
          Console.WriteLine("---  ----  -----");
    
          Encoding SJISEncoding = Encoding.GetEncoding("shift_jis");
    
          foreach (string s in aa)
          {
            Console.WriteLine(s + "  " + ToHexString(SJISEncoding.GetBytes(s)) + "  0x" + String.Format("{0:x}", (int)s[0]));
          }
          Console.WriteLine("----------------------------------------");
        }
    
        public static string ToHexString(byte[] bytes)
        {
          string hexString = string.Empty;
          if (bytes != null)
          {
            StringBuilder strB = new StringBuilder();
    
            for (int i = 0; i < bytes.Length; i++)
            {
              strB.Append(bytes[i].ToString("X2"));
            }
            hexString = strB.ToString();
          }
          return hexString;
        }
      }
    
      class CustomComparer : System.Collections.IComparer
      {
        // Used to compage the Shift_Jis code of the passed in string parameters and return the result.
        public int Compare(object x, object y)
        {
          Encoding SJISEncoding = Encoding.GetEncoding("shift_jis");
    
          // Get the Shift_Jis code.
          string SJISs1 = Program.ToHexString(SJISEncoding.GetBytes((string)x));
          string SJISs2 = Program.ToHexString(SJISEncoding.GetBytes((string)y));
    
          return string.Compare(SJISs1, SJISs2, true);
        }
      }


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Edited by Leo Liu - MSFT Friday, February 25, 2011 3:22 PM Updated the code snippet.
    • Marked as answer by Leo Liu - MSFT Monday, February 28, 2011 2:08 AM
    Friday, February 25, 2011 2:44 AM
  • Thanks Liu,

    I checked with other JIS table symbols and found working fine.

    Thank you very much for your help.

     

    Regards

    PraveenM

     

    • Marked as answer by Praveen_More Monday, February 28, 2011 7:43 AM
    Monday, February 28, 2011 7:43 AM

All replies

  • Hello Peaveen,

     

    Welcome to the MSDN forum. Thank you for posting here. This is Rocky, and I will be working with you on this post.

     

    According your description, I believe you want to sort an array by descending, and you received the wrong result with the “Array.Sort()” method, because the “Array.Sort()” method is one of the ascending methods. We will be working together to solve this issue, so please feel free to let me know if I have misunderstood anything.

     

    In my experience, I think there are three ways to solve this issue:

    1.      We can overload the “Array.Sort()” to make a method we need.

    Here are some helpful links for your reference:

    http://msdn.microsoft.com/en-us/library/system.array.sort.aspx

    http://msdn.microsoft.com/en-us/library/system.array.sort(v=VS.71).aspx

    Some code for your reference

    2.      We can create a new method to sort the array. For example, the bubble-sort.

    You can get more helps from this link:

    http://www.sorting-algorithms.com/bubble-sort

    3.      The easiest way to this issue: add Array.Reverse();” after “Array.Sort()”. But, for OOP and programing well, you’d better not do this.

     

    Please try to write a new method.

    I hope this will help resolve your problem. If anything is unclear, please free feel to let us know.

    Best Regards,

    Rocky

    Tuesday, February 22, 2011 3:08 AM
  • Hi Rocky,

    Thanks for you feed back.

    I want the sorting according to shift-JIS table according to which the result which i am expecting is in ascending order.

    Following is the link I am referring to the Shift JIS table (sorry it is in Japanese..):

    http://charset.7jp.net/sjis.html

     

    There are lot more observations on Array.sort() API....as i saw it works indifferent for DBCS.

    Let us suppose according to above provided table(inside link) DBCS symbol "(" and DBCS symbol "`" should be sorted like this "`" and then "(" but Array.Sort() returning output like this "(" and then "`".

    please let me know your comments on both the issues.

     


     


    Tuesday, February 22, 2011 10:20 AM
  • Hi Praveen,

    Please find this page in the same site given by you:
    http://charset.7jp.net/jis0208.html.
    I’ve taken a screenshot on this page as below to show the definition of these special characters:

    In this table these characters may be sorted by the so-called JIS or SJIS or EUC code, I know that your original purpose is to sort these characters in the program within this sequence.
    Please wait, let’s see my reproducing of your program:

      static void Main(string[] args)
      {
       char[] aa = new char[]
       { 
        '☆',
        '★',
        '○',
        '●',
        '◎',
        '◇',
        '◆',
        '□',
        '■'
       };
    
       Console.WriteLine("----------------------------------------");
       Console.WriteLine("Before Sorting:");
       foreach (char s in aa)
       {
        Console.WriteLine(s);
       }
       Console.WriteLine("*****************************************");
       Console.WriteLine("After Sorting:");
       Array.Sort(aa);
    
       foreach (char s in aa)
       {
        Console.WriteLine(s + " 0x" + String.Format("{0:x}", (int)s));
       }
       Console.WriteLine("----------------------------------------");
      }
    


    Notice that I have modified your program:
    Use a char array instead of a string array.
    Show the Unicode of every character in the array after sorting.

    Now you can figure out that by default, the Array.Sort method will sort according to the Unicode.

    But please see the output in the console window as below:

    The order of the last two lines is converse compared to your given output, I think you might make a clerical error.


    So the solution to fulfill your requirement is to inherit and realize the IComparer Interface which is passed as the second parameter to the overloaded method Array.Sort(Array, IComparer).

    Here is one sample for you to use:
     class CustomComparer : System.Collections.IComparer
     {
      public int Compare(object x, object y)
      {
       char Unicodechar1 = (char)x;
       char Unicodechar2 = (char)y;
    
       if (Unicodechar1 > Unicodechar2) return 1;
       if (Unicodechar1 < Unicodechar2) return -1;
       return 0;
      }
     }
    

    Then call a overloaded method of Array.Sort like this:
    Array.Sort(aa, new CustomComparer());
    


    If you use this, the output will perform the same as the default sorting, you should overwrite the Compare method in order to sort by, for example, SJIS code. In other words, you should compare the SJIS code of two passed in parameters in this Compare method.


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, February 24, 2011 5:28 AM
  • Hi,

    "you should compare the SJIS code of two passed in parameters in this Compare method."

    As per your statement can you please help me how do i get SJIS code from these two parameter?

     

    Thanks.

     

    Thursday, February 24, 2011 6:42 AM
  • This is about getting the SJIS code of a char variable.
    I did a quick search, there is a format of shift_jis, please find methods of converting from UTF-16 to shift_jis FYI.


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, February 24, 2011 6:58 AM
  • Please help me for getting SJIS code of a char variable.

    I tried to google it but could not able to find any source code in c#.

     

    Thanks

    Thursday, February 24, 2011 7:57 AM
  • Sorry for the delay, actually I’ve never used the Japanese code standard Shift_Jis. Here I’ve just fulfilled the sample, please directly copy this code snippet to your project.
    See the detailed explanations in the code snippet. Thanks.
    Looking forward to hearing from you.
      static class Program
      {
        static void Main(string[] args)
        {
          //Notice here I have disorganized the sequence.
          string[] aa = new string[]
          {
            "○",
            "★",
            "☆",
            "●",
            "◎",
            "◇",
            "□",
            "◆",
            "■"
          };
    
    
          Console.WriteLine("----------------------------------------");
          Console.WriteLine("Before Sorting:");
          foreach (string s in aa)
          {
            Console.WriteLine(s);
          }
          Console.WriteLine("*****************************************");
          Console.WriteLine("After Sorting:");
          Array.Sort(aa, new CustomComparer());
    
          // The result table title.
          Console.WriteLine("Str  SJIS  UTF16");
          Console.WriteLine("---  ----  -----");
    
          Encoding SJISEncoding = Encoding.GetEncoding("shift_jis");
    
          foreach (string s in aa)
          {
            Console.WriteLine(s + "  " + ToHexString(SJISEncoding.GetBytes(s)) + "  0x" + String.Format("{0:x}", (int)s[0]));
          }
          Console.WriteLine("----------------------------------------");
        }
    
        public static string ToHexString(byte[] bytes)
        {
          string hexString = string.Empty;
          if (bytes != null)
          {
            StringBuilder strB = new StringBuilder();
    
            for (int i = 0; i < bytes.Length; i++)
            {
              strB.Append(bytes[i].ToString("X2"));
            }
            hexString = strB.ToString();
          }
          return hexString;
        }
      }
    
      class CustomComparer : System.Collections.IComparer
      {
        // Used to compage the Shift_Jis code of the passed in string parameters and return the result.
        public int Compare(object x, object y)
        {
          Encoding SJISEncoding = Encoding.GetEncoding("shift_jis");
    
          // Get the Shift_Jis code.
          string SJISs1 = Program.ToHexString(SJISEncoding.GetBytes((string)x));
          string SJISs2 = Program.ToHexString(SJISEncoding.GetBytes((string)y));
    
          return string.Compare(SJISs1, SJISs2, true);
        }
      }


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Edited by Leo Liu - MSFT Friday, February 25, 2011 3:22 PM Updated the code snippet.
    • Marked as answer by Leo Liu - MSFT Monday, February 28, 2011 2:08 AM
    Friday, February 25, 2011 2:44 AM
  • Hi,

    I checked with your updated coded but it not able to sort according to SJIS table.

    Below is the output of the code.

    ----------------------------------------
    Before Sorting:









    *****************************************
    After Sorting:
    String UTF16 Code
    ■   0x25a0
    □   0x25a1
    ◆   0x25c6
    ◇   0x25c7
    ○   0x25cb
    ◎   0x25ce
    ●   0x25cf
    ☆   0x2606
    ★   0x2605
    ----------------------------------------
    Press any key to continue . . .

    Expected sorting order:









     

    Thanks

    PraveenM

    Friday, February 25, 2011 5:12 AM
  • In your sorted result, the characters are still in ascending order based on the UTF16 code.

    Please make sure you've copied the whole code snippet given in my last post into your program.
    Several critical points that you should pay attention to:

    Use the overloaded method of Array.Sort to pass in the comparing condition defined in the class CustomComparer.
    Array.Sort(aa, new CustomComparer());

    I've changed back to use the string array aa and disorganized the original sequence.

    This is the output:


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, February 25, 2011 7:21 AM
  • Hi

    I have pasted your whole program as it :

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Collections;
    using System.Globalization;

    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                //Notice here I have disorganized the sequence.
                string[] aa = new string[]
               {
                "○",
                "★",
                "☆",
                "●",
                "◎",
                "◇",
                "□",
                "◆",
                "■"
               };

                Console.WriteLine("----------------------------------------");
                Console.WriteLine("Before Sorting:");
                foreach (string s in aa)
                {
                    Console.WriteLine(s);
                }
                Console.WriteLine("*****************************************");
                Console.WriteLine("After Sorting:");
                Array.Sort(aa, new CustomComparer());

                // The result table title.
                Console.WriteLine("String UTF16 Code");

                foreach (string s in aa)
                {
                    Console.WriteLine(s + "   0x" + String.Format("{0:x}", (int)s[0]));
                }
                Console.WriteLine("----------------------------------------");
            } 

            class CustomComparer : System.Collections.IComparer
            {
                // Used to compage the Shift_Jis code of the passed in string parameters and return the result.
                public int Compare(object x, object y)
                {
                    Encoding SJISEncoding = Encoding.GetEncoding("Shift_Jis");

                    // Get the Shift_Jis code.
                    string SJISs1 = Encoding.Default.GetString(SJISEncoding.GetBytes((string)x));
                    string SJISs2 = Encoding.Default.GetString(SJISEncoding.GetBytes((string)y));

                    return string.Compare(SJISs1, SJISs2, true);
                }
            }

        }
    }

    I am not getting the output which you are getting.

    Please let me know if any thing wrong i did.

    Thanks

    PraveenM

    Friday, February 25, 2011 1:20 PM
  • I've made a mistake. SJISEncoding.GetBytes((string)x) does really returns a byte array which contains the Shift_JIS code corresponding to x. But Encoding.Default.GetString(SJISEncoding.GetBytes((string)x)) is not the trench of parsing the byte array.
    So weird that it could get the wanted sequence on my working machine using the original code, but fails on my own PC.
    I've added a method to parse the byte array to the hex Shift_JIS code of x.
    And have added code to show Shift_JIS code in the console window.
    I've modified my last code-involved reply, please find the latest code snippet there, still copy the whole this time as I've modified many parts of the code, such as changing the Program class to be static.

    Tip:
    If you want to get the Shift-JIS encoding, "shift_jis"  "shift-jis"  "ms_kanji", any of the three is okay.


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, February 25, 2011 3:33 PM
  • Thanks Liu,

    I checked with the latest code updated by you and it is showing expected results.

    I am checking with some more SJIS table characters and then let you know the update.

    "So weird that it could get the wanted sequence on my working machine using the original code, but fails on my own PC."

    Is that a bug from your side?

     

    Thanks

    PraveenM

     

     

    Monday, February 28, 2011 3:35 AM
  • Maybe that is. LOL. Last friday the expected result was got with the old code on my working machine.
    Whatever, glad to know that the new code works normally there at your side.
    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, February 28, 2011 4:40 AM
  • Thanks Liu,

    I checked with other JIS table symbols and found working fine.

    Thank you very much for your help.

     

    Regards

    PraveenM

     

    • Marked as answer by Praveen_More Monday, February 28, 2011 7:43 AM
    Monday, February 28, 2011 7:43 AM
  • Hi Liu,

    One more help needs to be require.

    I have written code in C++ using STL which should provide sorting according to JIS table.

    But it is also not able to do so.

    Here is code Snippet:

    bool SortCompareI18N( CSEString &left, CSEString &right )

    {

          _locale_t lCurLocale = _get_current_locale();

          bool bRetVal = false;

          int lSortCompareVal = 0;

          lSortCompareVal = _tcscoll_l(left.c_str(), right.c_str(), lCurLocale);

          if(lSortCompareVal <= 0)

          {

                bRetVal = true;

          }
          return bRetVal;

    }
     

    void GetLocalSortedVector(std::vector<CSEString>& sInSortedVec)

    {

          std::sort(sInSortedVec.begin(), sInSortedVec.end(), SortCompareI18N);

    }

    I have used STD::SORT for doing that...can you update this code so that it will provide me the needed result.

     

    Thanks in advance.

    PraveenM

    Monday, March 21, 2011 12:55 PM
  • Sorry for the delay.
    Actually I am not familiar with C++.
    You can post your new issue in the
    Visual C++ General Forum. Don’t forget to give the link of this thread to our friends there for them as a reference. Thanks.


    Leo Liu [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, March 22, 2011 7:43 AM
  • Ok..I will do that..

    Thanks Liu.

    Tuesday, March 22, 2011 12:05 PM