none
Cute question mark is keep appearing at the end of a string, and no longer I like it. RRS feed

  • Question

  • It's been a day I can't get rid of those cute question marks. I definitely need a help with them.

    Working environment [VS 2010, Oracle 11g].

    Task is to store and retrieve huge text from Oracle DB through win32 API into windows form.

    In Database I've a BLOB field for the text. In my form (C#) I have a Text Box for the text.

    Form sends the text to win32 API then win32 API has to store it in database.

    After all, another Form (C#) retrieves the text from database.

    Code goes like following.

    //Win32 storing the text (C++) wstring theText=get_the_text(); //(gets the text from Form) string strText=convert_wstring_to_string(theText); //(type conversion)

    store_in_database(new ByteArray((char*)strText.c_str(), strText.length(), true));

    //---------------------------------------------------------------------------------

    //Form retrieving the text (C#)

    String theText=System.Text.Encoding.UTF8.GetString((byte[])get_from_database());

    TextBox txtDisplay.Text=theText;

    In code, every self-explanatory function does what they suppose to do. / get_the_text(), store_in_database() ...etc /

    As a result I got everything I want except question mark is appearing at the end of my Text Box. Every time I update the text, 1 more appears at the end of the Text Box.

    So from where question mark come up? How to get rid of the question mark?

    Any comment, suggestion or hint is welcomed :)

    I don't like them ----> ��������



    Tuesday, June 4, 2013 12:40 PM

Answers

  • [...]

    Sometimes "Hello world" becomes "Hello world�" or "Hello worl�".


    Do you see the wrong text if you investigate the text using some database tools (i.e. execute SELECT * FROM … manually)? Or if you watch the values returned by fixed convert_wstring_to_string in Debugger.

    Show some details about store_in_database and ByteArray. Does ByteArray copy bytes into a new array?

    Maybe you can redesign the database so that Unicode strings will be accepted instead of UTF-8.


    • Edited by Viorel_MVP Wednesday, June 5, 2013 6:48 AM
    • Marked as answer by Naagii Teka Wednesday, June 5, 2013 7:08 AM
    Wednesday, June 5, 2013 6:46 AM

All replies

  • Did you check this function?

    convert_wstring_to_string(theText);


    jdweng

    Tuesday, June 4, 2013 1:07 PM
  • Yes I did. Also those self-explanatory functions. They work just fine.

     
    Tuesday, June 4, 2013 1:49 PM
  • If the next fragment helps:

    byte[] b = get_from_database());
    string theText = System.Text.Encoding.UTF8.GetString(b, 0, b.Length - 1);

    then probably the data were not stored correctly into database using UTF8 encoding. Check if the last bytes of b contain expected values.

    Tuesday, June 4, 2013 6:44 PM
  • Thanks for the hint. There is something stored incorrectly in DB.

    When I call query below, it gives me "ORA-29275: partial multibyte character".

    select utl_raw.cast_to_varchar2(dbms_lob.substr(BLOB_FIELD)) from TABLE_WITH_BLOB;

    ORA-29275: partial multibyte character.

    Cause: The requested read operation could not complete because a partial multibyte character was found at the end of the input.

    Now the question becomes what could be at the end of a byte array?

    And how to remove it? (strText length -1 doesn't help)

    new ByteArray((char*)strText.c_str(), strText.length(), true);



    • Edited by Naagii Teka Wednesday, June 5, 2013 3:06 AM
    Wednesday, June 5, 2013 3:04 AM
  • this is simply because the data or text returned has null value or unsupported characters at the end.  

    Mohammad Saidul Karim

    Wednesday, June 5, 2013 3:05 AM
  • In my case, you mean strText ends with null or unsupported character?
    Wednesday, June 5, 2013 3:11 AM
  • Maybe convert_wstring_to_string has some issues. Does it convert to UTF8? Show some details about convert_wstring_to_string.

    Wednesday, June 5, 2013 4:44 AM
  • Let me explain what is going wrong.  You have an encoding issues of your input character array or the character array isn't properly terminated.  You also need to check the filed type you are using in the database which may be part of the problem

    The string class in the Net library uses two bytes per character and has a private property for each character indicating if the character is one or two bytes.  The two byte characters are needed to handle unicode characters.  so when you use the following line one bytes characters are stored into the string

    string myString = "abc"

    When you have a byte[] and want to laod it into a string you may have one byte, character, tow bytes characters, or combinations of both one or two bytes characters.  so you could have an array like this

    byte[] mybyte = {0x31, 0x32,0x33,0x34}.

    So this array could be the one bytes character '1','2','3','4' or the unicode characters 0x3132 and 0x3334.

    Now if you are dealing with straight c language string  which were one character arrays that where terminated with nulls like this

    byte[] mybyte = {0x31, 0x32,0x33,0x34, 0x00}

    An extra 0x00 character where added to the end of the string to indicate where the string ended.  When writing your C++ storage code you may need to remove the extra null at the end of the string before you store the data into oracle.  I don't know if the field you are using in Oracle is designed to handle one or two byte characters.  I also don't know if Oracle is expecting a null at the end of the data.

    wstring theText=get_the_text(); //(gets the text from Form)
    string strText=convert_wstring_to_string(theText); //(type conversion)store_in_database(new ByteArray((char*)strText.c_str(), strText.length(), true));
    As i said previously the function convert_wstring_to_string() is writing too many characters to the database.  It may not be checking for the null, just writing too many characters, or not clearing old data from the array.

    jdweng

    Wednesday, June 5, 2013 4:52 AM
  • I was told to believe that function is reliable and never looked it before.

    But actual function makes me curious.

    string convert_wstring_to_string(const wstring wide)
    {
      //Removed not important sections
      // Calculate necessary buffer size
      int len = wide.size() * 2;
      // Perform actual conversion
      if (len > 0){
    	char* buffer = new char[len + 1];
    	len = WideCharToMultiByte(CP_UTF8, 0, wide.c_str(), wide.size(), buffer, len, NULL, NULL);
    
    	if (len > 0) {
    		string s(buffer, len);
    		delete[] buffer;
    		return s;
    	}
      }
    }
    Anyway, I still can't see where the question mark is.

    Wednesday, June 5, 2013 5:18 AM
  • In UTF8 there are characters that require more than two bytes, therefore the estimative formula ‘len = wide.size() * 2’ is not always correct. To calculate the required length, try this:

    int len = WideCharToMultiByte(CP_UTF8, 0, wide.c_str(), wide.size(), NULL, 0, NULL, NULL);

    Then add 1 and allocate the buffer. See the documentation about the returned values in case of errors.



    • Edited by Viorel_MVP Wednesday, June 5, 2013 5:33 AM
    Wednesday, June 5, 2013 5:31 AM
  • Don' t assume anything and don't believe what other people tell you.  This function is definitely the cause of the problem.  You are always adding an extra charcater (char[len + 1]).  YO also don't know what the function WideCharToMultiByte() is doing.

    It looks like you are storing unicode (two byte) characters then when you pull data from the database you are using UTF8 to convert to a string.  which means you are loosing all the unicode characters.  Try changing the following

    From :

    String theText=System.Text.Encoding.UTF8.GetString((byte[])get_from_database());

    To :

    String theText=System.Text.Encoding.Unicode.GetString((byte[])get_from_database());



    jdweng

    Wednesday, June 5, 2013 5:42 AM
  • Don' t assume anything and don't believe what other people tell you.  This function is definitely the cause of the problem.  You are always adding an extra charcater (char[len + 1]).  YO also don't know what the function WideCharToMultiByte() is doing.

    It looks like you are storing unicode (two byte) characters then when you pull data from the database you are using UTF8 to convert to a string.  which means you are loosing all the unicode characters.  Try changing the following

    From :

    String theText=System.Text.Encoding.UTF8.GetString((byte[])get_from_database());

    To :

    String theText=System.Text.Encoding.Unicode.GetString((byte[])get_from_database());



    jdweng

    Text almost gets translated into Chinese though, "Hello world" becomes "效汬潗汲�".

    I don't think that Chinese is saying "Hello" to me :D

    Wednesday, June 5, 2013 6:34 AM
  • In UTF8 there are characters that require more than two bytes, therefore the estimative formula ‘len = wide.size() * 2’ is not always correct. To calculate the required length, try this:

    int len = WideCharToMultiByte(CP_UTF8, 0, wide.c_str(), wide.size(), NULL, 0, NULL, NULL);

    Then add 1 and allocate the buffer. See the documentation about the returned values in case of errors.



    I tried almost every possible way of altering this function still got no luck.

    Sometimes "Hello world" becomes "Hello world�" or "Hello worl�".

    Wednesday, June 5, 2013 6:36 AM
  • [...]

    Sometimes "Hello world" becomes "Hello world�" or "Hello worl�".


    Do you see the wrong text if you investigate the text using some database tools (i.e. execute SELECT * FROM … manually)? Or if you watch the values returned by fixed convert_wstring_to_string in Debugger.

    Show some details about store_in_database and ByteArray. Does ByteArray copy bytes into a new array?

    Maybe you can redesign the database so that Unicode strings will be accepted instead of UTF-8.


    • Edited by Viorel_MVP Wednesday, June 5, 2013 6:48 AM
    • Marked as answer by Naagii Teka Wednesday, June 5, 2013 7:08 AM
    Wednesday, June 5, 2013 6:46 AM
  • If the next fragment helps:

    byte[] b = get_from_database());
    string theText = System.Text.Encoding.UTF8.GetString(b, 0, b.Length - 1);

    then probably the data were not stored correctly into database using UTF8 encoding. Check if the last bytes of b contain expected values.


    Currently, I'm using similar way of using the text. But it's not acceptable solution to me because there are some other parts that uses the text and I don't have an access to those.
    Wednesday, June 5, 2013 6:48 AM
  • Show some details about store_in_database and ByteArray. Does ByteArray copy bytes into a new array?

    Maybe you can redesign the database so that Unicode strings will be accepted instead of UTF-8.

    Thank you very much. I caught those question mark bastards :D.

    Problem was in ByteArray function just like you pointed out.

    It does copy bytes into new array with question mark :P.

    Best Regards,

    Teka

    Wednesday, June 5, 2013 7:07 AM