locked
Convert an Html Code to Unicode RRS feed

  • Question

  • User1637688157 posted

    Hi,

    I want to convert an Html or Hash code into Unicode.

    e.g.  الادارة = الادارة

    where  "الادارة" is a hash code and الادارة is it's unicode value.

     

    Please help.

     

    Friday, January 19, 2007 8:28 AM

Answers

  • User113421904 posted

    Hi,

    Here is one example that will make you better understand this, First of all, I assume we use the ASCII character set.

    The ASCII value of "Asp" is.

    'A' 65

    's' 115

    'p' 112

    Then the numerical character reference for "asp.net" is "&#65&#115&#112&#46&#110&#101&#116"

     Please note I put &#-1 at end of the tring in my code to mark the end of thring to simplify.

     

    Here is my code demonstrate how to decode it (unfortuately, Server.Decode() provided in my last post doesn't work for this senario), so I decide to write a simple one to decode it.

        protected void Page_Load(object sender, EventArgs e)
        {
            String EncodedString = "&#65&#115&#112&#46&#110&#101&#116&#-1";

            int start;
            int at;

            byte[] bytes = new byte[7]; // hardcoded string character length;

            int byteIndex = 0;

            at = 0;
            start = 0;
            int last = -1;

            while ((start < EncodedString.Length) && (at > -1))
            {
                at = EncodedString.IndexOf("&#", start); // find "&#" in string one by one
                if (at == -1) break;
                start = at + 2;           

                byte temp;

                if (last != -1)
                {   

                    temp = Convert.ToByte(EncodedString.Substring(last, at - last));  // Convert string "65", "115" to a byte and assign to it bytes array
                    bytes[byteIndex++] = temp;
                }

                last = start;           

            }

            Response.Write("<br/><hr><br/>");
            Response.BinaryWrite(bytes); // Output bytes
            Response.Write("<br/><hr><br/>");

        }

    Outputs:

    Asp.net

     

    So, 

    (1) &#1575;&#1604;&#1575;&#1583;&#1575;&#1585;&#1577; means a sequence of bytes with decimal value:

    1575 1604 1575 1583 1575 1585 1577 

     however it doesn't say the original encoding of the bytes(Unicode or other Multiplebyte encoding). since these values are greater than (256), it looks like two-byte encoding.

    (2) You can customize my code to output the byte. But please note my code assume it is single byte encoding.

    (3) Please check what is the original encoding, if it is from html file, what is the character set this html uses. Please provide other details.

     

    Hope it helps!

     

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 23, 2007 12:14 AM
  • User1637688157 posted

    Hi,

    I am very thankful to you. I got the solution from Server.HtmlDecode() function.

    just pass your code as parameter and get the answer.

     

    thanks again

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 23, 2007 3:29 AM

All replies

  • User113421904 posted

    Hi,

    &#1575;&#1604;&#1575;&#1583;&#1575;&#1585;&#1577 in not hash code, it is numeric character references of the original characters. it represents one character each, &#1575 should be 1575 (decimal) in binary.

    Have you tried HTMLDecode()?

    Monday, January 22, 2007 8:38 AM
  • User1637688157 posted

    Hi,

     

    No, I never tried the code.

    Can you please explain me?

    you want to say that if i convert this integer number to string then it will generate the desired unicode?

     

    Monday, January 22, 2007 11:12 AM
  • User113421904 posted

    Hi,

    Here is one example that will make you better understand this, First of all, I assume we use the ASCII character set.

    The ASCII value of "Asp" is.

    'A' 65

    's' 115

    'p' 112

    Then the numerical character reference for "asp.net" is "&#65&#115&#112&#46&#110&#101&#116"

     Please note I put &#-1 at end of the tring in my code to mark the end of thring to simplify.

     

    Here is my code demonstrate how to decode it (unfortuately, Server.Decode() provided in my last post doesn't work for this senario), so I decide to write a simple one to decode it.

        protected void Page_Load(object sender, EventArgs e)
        {
            String EncodedString = "&#65&#115&#112&#46&#110&#101&#116&#-1";

            int start;
            int at;

            byte[] bytes = new byte[7]; // hardcoded string character length;

            int byteIndex = 0;

            at = 0;
            start = 0;
            int last = -1;

            while ((start < EncodedString.Length) && (at > -1))
            {
                at = EncodedString.IndexOf("&#", start); // find "&#" in string one by one
                if (at == -1) break;
                start = at + 2;           

                byte temp;

                if (last != -1)
                {   

                    temp = Convert.ToByte(EncodedString.Substring(last, at - last));  // Convert string "65", "115" to a byte and assign to it bytes array
                    bytes[byteIndex++] = temp;
                }

                last = start;           

            }

            Response.Write("<br/><hr><br/>");
            Response.BinaryWrite(bytes); // Output bytes
            Response.Write("<br/><hr><br/>");

        }

    Outputs:

    Asp.net

     

    So, 

    (1) &#1575;&#1604;&#1575;&#1583;&#1575;&#1585;&#1577; means a sequence of bytes with decimal value:

    1575 1604 1575 1583 1575 1585 1577 

     however it doesn't say the original encoding of the bytes(Unicode or other Multiplebyte encoding). since these values are greater than (256), it looks like two-byte encoding.

    (2) You can customize my code to output the byte. But please note my code assume it is single byte encoding.

    (3) Please check what is the original encoding, if it is from html file, what is the character set this html uses. Please provide other details.

     

    Hope it helps!

     

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 23, 2007 12:14 AM
  • User1637688157 posted

    Hi,

    I am very thankful to you. I got the solution from Server.HtmlDecode() function.

    just pass your code as parameter and get the answer.

     

    thanks again

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Tuesday, January 23, 2007 3:29 AM
  • User113421904 posted

    Hi,

    I'm glad that you have solved this issue. However I was wondering why Server.HtmlDecode() doesn't work for me now, I thought it should work.

     

    Tuesday, January 23, 2007 4:39 AM
  • User1637688157 posted

    Hi,

    I am not able to understand that why it is not working for you. Whereas it's working fine here. 

     

    Wednesday, January 24, 2007 2:26 AM
  • User113421904 posted

     

    Never mind, I'll check this.

    [:D]

    Wednesday, January 24, 2007 5:06 AM