none
Some characters silently getting converted to "??" RRS feed

  • Question

  • Hi all,

    In our application, we invoke the web services and receive the http response. The soap envolop is extracted from the http response. The soap response has an xml object emedded in it that can contain characters of different languages such as swedish, icelandic, etc.

    In our own wrapper classes we read the response multiple times untill all the reading is done.  The buffer (fBuffer) is populated in each call using the standard windows socket API

      if (vState.DoneReading(ar))
            vState.Done();
      else
            if (vState.fUseDelimiter) 
                 vState.Sock.BeginReceive(vState.fBuffer, 0, BufferSize, SocketFlags.None, new AsyncCallback(ReceiveCallback), vState);
            else
                 vState.Sock.BeginReceive(vState.fBuffer, 0, Math.Min(BufferSize, vState.fLength-vState.fBytesRead), SocketFlags.None, new AsyncCallback(ReceiveCallback), vState);


    The receved buffer is decoded and appended in the string using the recursive function

     private bool AddToResult(bool aReadResult)
     {

         if (fCurRead > 0)

         {

               if (fReadingInt)

                    fIntValue = BytesToInteger();

               else

               {

                      if (fUseDelimiter)

                      {

                            string s = UTF8Encoding.GetString(fBuffer, 0, fCurRead);

                            fStringValue.Append(s);

                            if (s.Length >= fDelimiter.Length)

                                aReadResult = s.IndexOf(fDelimiter) == -1;

                            else

                                aReadResult = fStringValue.ToString().IndexOf(fDelimiter) == -1;

                        }

                        else

                        {

                            fStringValue.Append(UTF8Encoding.GetString(fBuffer, 0, Math.Min(fLength,fCurRead)));

                        }                    

               }

         }



    However in the "GetString()" API, some of the characters are incorrectly converted to "?". One of the reason can be that special characters which are encoded in 3 or more bytes are getting corrupted when we populate the buffur and convert to string.

    The recommendation on the msdn (http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx) states that we should use GetCharCount() and GetChar() API's in the decoding. However this does not solve our problem.

    We are always using UTF8 in the object wrapped under soap payload. Any help on how we can fix the ?? issue would be great.

    thanks in advance


    Manish Agarwal
    Monday, July 7, 2008 12:34 PM

Answers

  • The "fStringValue.Append()" call suggests that you converting partial responses from the network stream.  That cannot work properly, you may have received only 2 bytes of a 3 byte UTF8 encoded character.  Don't convert until you've received the full packet.  Also make sure that the transmitter is encoding properly and that it is not trying to package a byte[] into a string.
    Hans Passant.
    • Marked as answer by Zhi-Xin Ye Thursday, July 10, 2008 8:01 AM
    Monday, July 7, 2008 1:04 PM
    Moderator