none
How does Azure table sort Unicode characters? How are 'Cf' characters handled?

    Question

  • I am trying to figure out what is the range of characters I can use within a property or PartitionKey or RowKey.  MSDN documentation says I can use a wide variety of characters within a property name:

    http://msdn.microsoft.com/en-us/library/dd179338.aspx

    So far, I'm using 240 characters, each with a range of:

    0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

    Now that I know I can use unicode as a property name I have a few questions:

    (1) How do I use this within the Storage Client?  Is anything special required?

    (2) Does a single unicode character use 'one' character slot, or more than one? in other words can I have 255 unicode characters represent a property name?

    (3) Does using unicode put me at risk for service disruption, corruption, or other conflicts internal to how Azure Storage works?

    (4) What are the disallowed unicode characters? The following article displays ASCII, but since a symbol can appear multiple times in Unicode it's unclear which symbol (or all ) I need to escape.

    http://msdn.microsoft.com/en-us/library/dd179338.aspx 

    (5) How are the Cf characters (formatting) and similar non-display characters handled?

    (6) How does Azure Table sort data internally when unicode is used? Is the underlying structure unicode based?

     

    • Moved by DanielOdievichEditor Tuesday, September 28, 2010 10:20 PM forum migration (From:Windows Azure)
    Sunday, September 19, 2010 3:49 AM

Answers

  • 1) As mentioned above, there's no special requirements.

    2) Each UTF-8 character may take 1 to 4 bytes. You can refer to http://en.wikipedia.org/wiki/UTF-8 for more information.

    3) Yes. Be sure not to use /', '\', '#', and '?' or '%'

    4) ASCII is identical to the first 128 UTF-8 characters. You don't need to do any translation.

    5) Sorry I don't know much about Cf. But if it is a unicode character, than it oppcupies slots.


    Lante, shanaolanxing This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by Yi-Lun Luo Friday, September 24, 2010 9:38 AM
    Tuesday, September 21, 2010 2:26 AM

All replies

  • Hello, you can use unicode characters in keys as if you're using ASCII characters, especialy for '/', '\', '#', and '?'. In addition, the '%' character must be used with care in key. We've found some problems with it.

    A unicode character may take one or more slots. For example, an English alphabet takes one slot, but a Chinese character takes two slots.

    When querying/updating data, if the keys must present in URL, Unicode characters must be URL encoded. WCF Data Services library takes care of encoding for you. But if you're accessing the REST API directly, you should encode it manaully.

    Table storage sorts Uncode characters by their natural order. For example, \u0000 comes before \u0001.


    Lante, shanaolanxing This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, September 20, 2010 2:17 AM
  • I'm still confused on some answers.  Here is what I got from your reply...

    1) ?

    2) Yes, sometimes.... depends on the character.  {Followup: how do I know which character takes two slots?}

    3) ???... be sure not to use /', '\', '#', and '?' or '%' 

    4) ?? How do I translate ASCII to UNICODE (and back) to be sure I escape the correct character

    5) ?? Do formatting characters occupy a slot and affect sorting?  If so how?

    6) Yes, the data is sorted using natural unicode sort order. 


    Monday, September 20, 2010 5:48 PM
  • 1) As mentioned above, there's no special requirements.

    2) Each UTF-8 character may take 1 to 4 bytes. You can refer to http://en.wikipedia.org/wiki/UTF-8 for more information.

    3) Yes. Be sure not to use /', '\', '#', and '?' or '%'

    4) ASCII is identical to the first 128 UTF-8 characters. You don't need to do any translation.

    5) Sorry I don't know much about Cf. But if it is a unicode character, than it oppcupies slots.


    Lante, shanaolanxing This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by Yi-Lun Luo Friday, September 24, 2010 9:38 AM
    Tuesday, September 21, 2010 2:26 AM