none
Questions mark instead of data

    Question

  • Hello Friends,

    I tried these characters スタンダードチャ to add on database table and it looks like it actually stored 8 question marks instead of these characters.

    Can you try to find out what we have to do to allow our database to correctly store these types of characters?

    Thanks.

    Wednesday, July 10, 2013 7:04 PM

Answers

  • Hi,

    Please, change from varchar(1000) to nvarchar(1000)


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Wednesday, July 10, 2013 7:56 PM
  • Yup, as Sebastian said, the field needs to be able to store unicode characters. these are fields that begin with N: http://msdn.microsoft.com/en-us/library/ms186939.aspx

    Also you will want to:

    INSERT INTO [Table1](Note)

    VALUES(N'スタンダードチャ'); -- Notice the string is preceded with N.

    Wednesday, July 10, 2013 7:59 PM
  • Hi,

    NCHAR fields usually need 2x more space than regular CHAR fields.

    Be aware SQL allocates disk space by chunks (8K pages) not on a byte-by-byte basis; and that SQL needs

    additional disk space to write indexes, statistics, temporary tables, as well.

    Nevertheless 2x would be a valid estimation for a large number of rows.


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Thursday, July 11, 2013 5:27 PM

All replies

  • Hi,

    Please, post the destination table DDL.

    Is this particular field defined as CHAR? NCHAR? VARCHAR? NVARCHAR? TEXT? NTEXT? What collation?

    Ideally speaking, your Katakana characters should be stored in an Unicode field (NCHAR, NVARCHAR or NTEXT)


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu


    • Edited by Sebastian Sajaroff Wednesday, July 10, 2013 7:19 PM Typo
    • Proposed as answer by Rogge Wednesday, July 10, 2013 7:59 PM
    Wednesday, July 10, 2013 7:19 PM
  • Hi Sebastian & Friends,

    Thank you for your input.

    Column: note (varchar)

    DDL:
    CREATE TABLE [dbo].[Table1](
        [noteID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
        [note] [varchar](1000) NULL    -- this entry appears as question mark when I try to enter the international --character
        )

    Column has default database collation: SQL_Latin1_General_CP1_CI_AS

    Please help, what change should I make no so we do not see question marks.

    Thanks.


    • Edited by Database Junior Wednesday, July 10, 2013 7:55 PM Hello sentence change
    Wednesday, July 10, 2013 7:54 PM
  • Hi,

    Please, change from varchar(1000) to nvarchar(1000)


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Wednesday, July 10, 2013 7:56 PM
  • Yup, as Sebastian said, the field needs to be able to store unicode characters. these are fields that begin with N: http://msdn.microsoft.com/en-us/library/ms186939.aspx

    Also you will want to:

    INSERT INTO [Table1](Note)

    VALUES(N'スタンダードチャ'); -- Notice the string is preceded with N.

    Wednesday, July 10, 2013 7:59 PM
  • I think you just answered your own question. Latin is not Katakana. I would advise you to use nvarchar instead and use an other collation.

    Collation and Unicode support

    Wednesday, July 10, 2013 8:01 PM
  • Change from varchar(1000) to nvarchar(1000) did the trick. YEEY

    Do I still have to change the Collation?

    Thanks a lot ...

    Wednesday, July 10, 2013 8:14 PM
  • Hi

    It worked, so I wouldn't change the collation.


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Thursday, July 11, 2013 12:00 PM
  • The column collation determines characters that may be stored in char/varchar columns. Collation does not affect characters stored in Unicode columns so there is no need to change the collation of nchar/nvarchar columns. But note that collation determines the sorting and comparision rules for both Unicode and non-Unicode columns.

    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Thursday, July 11, 2013 12:32 PM
  • Hello Dan, Sebastian and friends,

    Que: Unicode is different data type? note column in my table is nvarchar so it became unicode? OR I have to change something else?

    Que: What collation do I set for my Note column in order to read?

    Que: Do I input collation in my insert?

    INSERT INTO TABLE [dbo].[Table1] VALUES (???)

    Please advise. (I am asking because the application inputs data from front end and in backend the stored procedure inserts into this column)

    Thanks.

    Thursday, July 11, 2013 1:06 PM
  • Hi,

    NCHAR (Unicode) is a wider data type, it takes 2 bytes (16 bits) per glyph (written symbol) instead of 1 byte (8 bits) like CHAR does.

    Collations are a natural extension of former database page codes.

    Page codes are a workaround to deal with CHAR limitation : there are just 256 characters to accommodate English symbols and your own ones.
    The range from 0 to 31 is allocated for control characters, 32 to 127 is reserved for English characters and widespread symbols (like dot, comma, brackets, etc) and the upper 128 are left for "locale" characters.

    "Locale" will obviously depend on what kind of page code ("language") we use : Spanish? French? Greek?
    Characters 0 to 127 are always the same, but characters 128 and above will vary depending on what page code we're setting.

    This quickly becomes confusing when you need to transfer text content from one country to another, and definitely useless with Far East languages or when you need to provide content in several languages

    (even Western ones) at the same time.

    During the second half of XXth century, Far East countries adopted several DBCS (double byte character sets) to accomodate their own writing on both traditional and modern calligraphies, but the diversity of

    available DBCS added even more complexity to the problem.

    At the end, Unicode was born as an international effort to provide an uniform binary representation

    of alphanumeric content.

    NCHAR is Microsoft's Unicode data type, giving plenty of room for most human languages, thus solving the page code restrictions (at the cost of doubling storage requirements).

    What are collations then?

    On 8 bits fields, collations are page codes plus sorting, accent-sensitivity and case-sensitivity

    instructions for the DB engine.

    On 16 bits fields, collations are just sorting, accent and case instructions.

    If you define your collation as Latin1_General_CS_AS, then you're intending to use basic Latin characters (English, French, Spanish, German, Italian and Portuguese), and your data will be Case-Sensitive (CS)

    and Accent-Sensitive (AS).

    So, [Robert] will be different to [robert] (CS) and [Perú] will be different from [Peru] (AS).

    You won't be able to insert [ζ], [Ӣ] or [子] because they don't belong to Latin1_General page code.

    Sorting criteria is important as well.

    For example, in Traditional Spanish, CH is a separate letter, located between C and D.
    So, in a Traditional Spanish dictionary, [Chile] would fall between [Cuba] and [Danemark].

    On the other hand, Modern Spanish splits CH in two letters (like in English).
    So, in that case, [Chile] would be placed between [Canada] and [Cuba].

    In the case of Unicode NCHAR fields, page code makes no sense but sorting, accent and case instructions still do.

    For example, in your NCHAR Note field you can insert a text like [Hello 子!]
    Do you want it to be distinct from [hello 子!] or not?

    Finally, to answer your last two questions : if you don't provide any collation instructions, NCHAR will behave
    according to the database collation (case-sensitivity, accent-sensitivity and sorting), and page code obviously being ignored.










    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Thursday, July 11, 2013 1:58 PM
  • Excellent one Sebastian! Thank you for your valuable explanation.

    • Edited by Database Junior Thursday, July 11, 2013 2:28 PM sentence change
    Thursday, July 11, 2013 2:28 PM
  • Hello Sebastian,

    Changing to nvarchar(1000) can you dertermine how much more space a row in that table requires compared to when it was a varchar?

    Big Thanks!


    Thursday, July 11, 2013 5:22 PM
  • Hi,

    NCHAR fields usually need 2x more space than regular CHAR fields.

    Be aware SQL allocates disk space by chunks (8K pages) not on a byte-by-byte basis; and that SQL needs

    additional disk space to write indexes, statistics, temporary tables, as well.

    Nevertheless 2x would be a valid estimation for a large number of rows.


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Thursday, July 11, 2013 5:27 PM
  • Thank You Sebastian !
    Thursday, July 11, 2013 7:19 PM