none
XmlTextReader : Get rid of "0x1F" RRS feed

  • Question

  • Hello, here's my code in vb.net

    i'm trying to read a xml response from the server 

     ' Retrieve XML document  
            Dim reader As XmlTextReader = New XmlTextReader("http://web.mirsms.ru/public/http/?user=23276.1&pass=xxx&action=inbox")
    
            ' Skip non-significant whitespace  
            reader.WhitespaceHandling = WhitespaceHandling.Significant
    
            ' Read nodes one at a time  
            reader.Normalization = True
            While (reader.Read())
    
                ' Print out info on node  
                URLResponseText.Text = URLResponseText.Text & vbCrLf & reader.NodeType.ToString() & reader.Name
    
            End While

    But xmlTextReader can't read the response because it starts with invalid character "0x1F"

    How can i get rid of this character?

    Thank you in advance

    Tuesday, December 18, 2012 4:41 PM

Answers

  • Finally i found out, that the webresponse was actually compressed with gZip.

    This 0x1f at the very beginning was a sign, that a webresponse is compressed(found this information in the internet).

    Here's the working code in vb.net (in case somebody'll have the same problem)

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim UrlDataRequest As String
            Dim Request As System.Net.HttpWebRequest
            Dim Response As System.Net.WebResponse
            Dim Stream As System.IO.Stream
            Dim UserAgent As String
            UrlDataRequest = "http://web.mirsms.ru/public/http/?user=23276.1&pass=xxx&action=post_sms&message=hello&target=89817477417"
            UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.17 Safari/537.11"
            'Download data
            Dim StringStreamReader As System.IO.StreamReader
            Dim aString As String
            Try
                Request = System.Net.WebRequest.Create(UrlDataRequest)
                Request.AutomaticDecompression = System.Net.DecompressionMethods.Deflate Or System.Net.DecompressionMethods.GZip
                Request.Headers.Add("Accept-Encoding", "gzip,deflate")
                Request.UserAgent = UserAgent
                Response = Request.GetResponse()
                Stream = Response.GetResponseStream()
                StringStreamReader = New StreamReader(Stream)
                aString = StringStreamReader.ReadToEnd.ToString
                MsgBox(aString)' finally i've got what i needed
            Catch ex As System.Net.WebException
                Throw ex
            End Try
        End Sub

    Thanks everybody for the answers anyway. :)


    • Marked as answer by hollow82 Saturday, December 22, 2012 3:43 PM
    • Edited by hollow82 Saturday, December 22, 2012 3:43 PM
    Saturday, December 22, 2012 3:42 PM

All replies

  • This happens because of the XML file is missing the encoding information

    Below snippet adds this encoding information

            Dim wc As New WebClient()
            Dim fileContent As String = wc.DownloadString("http://web.mirsms.ru/public/http/?user=23276.1&pass=xxx&action=inbox")
            fileContent = "<?xml version=""1.0"" encoding=""ISO-8859-9"" ?>" + fileContent
    
            Dim doc As New XmlDocument
            doc.LoadXml(fileContent)

    Now you should iterate through the doc object for finding the nodes you want.


    A.m.a.L Hashim
    Microsoft Most Valuable Professional
    My Blog - Dot Net Goodies

    Tuesday, December 18, 2012 5:14 PM
  • Thank You for the answer but i get the same error on 

    doc.LoadXml(fileContent)

    I think it's not because of the encoding
    • Edited by hollow82 Tuesday, December 18, 2012 6:25 PM
    Tuesday, December 18, 2012 6:23 PM
  • If you use the WebClient.DownloadString method instead you could examine the returned data, and replace all occurrences of 0x1F with something like 0x09 as a workaround. Maybe the server is returning XML 1.1 instead of XML 1.0 - can you change the request to ask for XML 1.0 instead?

    Valid characters in XML (Wikipedia)

    --
    Andrew

    • Proposed as answer by Frank L. Smith Tuesday, December 18, 2012 9:57 PM
    Tuesday, December 18, 2012 9:52 PM
  • Can you please help me understand how to get rid of invalid characters?

    since it's not possible to make the server answer in xml 1.0

    i found somthing like this  in the internet

    Public Function stripNonValidXMLCharacters(ByVal textIn As String) As [String]
            Dim textOut As New System.Text.StringBuilder()
            ' Used to hold the output.
            Dim current As Integer
            ' Used to reference the current character.
            If textIn Is Nothing OrElse textIn = String.Empty Then
                Return String.Empty
            End If
            ' vacancy test.
            For i As Integer = 0 To textIn.Length - 1
                current = AscW(textIn(i))
    
    
                If (current = &H9 OrElse current = &HA OrElse current = &HD) OrElse ((current >= &H20) AndAlso (current <= &HD7FF)) OrElse ((current >= &HE000) AndAlso (current <= &HFFFD)) OrElse ((current >= &H10000) AndAlso (current <= &H10FFFF)) Then
                    textOut.Append(ChrW(current))
                End If
            Next
            Return textOut.ToString()
        End Function

    this function returns a string, but after trying to loadXml gives another error somthing like 

    Data at the root level is invalid.(unlocalized from russian)

    i also tried to convert "0x1f" to string 

    and just to replace the "0x1f"  to ""

    it didn't work, seemed as if it didn't replace anything at all...returned the same error

    i also thought that the downloadstring is gzipped tried this function

     Public Shared Function UnZip(compressedText As String) As String
            Dim gzBuffer As Byte() = Convert.FromBase64String(compressedText)
            Using ms As New IO.MemoryStream()
                Dim msgLength As Integer = BitConverter.ToInt32(gzBuffer, 0)
                ms.Write(gzBuffer, 4, gzBuffer.Length - 4)
    
                Dim buffer As Byte() = New Byte(msgLength - 1) {}
    
                ms.Position = 0
                Using zipStream As New System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Decompress)
                    zipStream.Read(buffer, 0, buffer.Length)
                End Using
    
                Return System.Text.Encoding.Unicode.GetString(buffer, 0, buffer.Length)
            End Using
        End Function
    but it seems like the string is not compressed...

    what method should i use to get rid of invalid symbols and get normal xml?


    Friday, December 21, 2012 12:07 PM
  • Finally i found out, that the webresponse was actually compressed with gZip.

    This 0x1f at the very beginning was a sign, that a webresponse is compressed(found this information in the internet).

    Here's the working code in vb.net (in case somebody'll have the same problem)

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim UrlDataRequest As String
            Dim Request As System.Net.HttpWebRequest
            Dim Response As System.Net.WebResponse
            Dim Stream As System.IO.Stream
            Dim UserAgent As String
            UrlDataRequest = "http://web.mirsms.ru/public/http/?user=23276.1&pass=xxx&action=post_sms&message=hello&target=89817477417"
            UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.17 Safari/537.11"
            'Download data
            Dim StringStreamReader As System.IO.StreamReader
            Dim aString As String
            Try
                Request = System.Net.WebRequest.Create(UrlDataRequest)
                Request.AutomaticDecompression = System.Net.DecompressionMethods.Deflate Or System.Net.DecompressionMethods.GZip
                Request.Headers.Add("Accept-Encoding", "gzip,deflate")
                Request.UserAgent = UserAgent
                Response = Request.GetResponse()
                Stream = Response.GetResponseStream()
                StringStreamReader = New StreamReader(Stream)
                aString = StringStreamReader.ReadToEnd.ToString
                MsgBox(aString)' finally i've got what i needed
            Catch ex As System.Net.WebException
                Throw ex
            End Try
        End Sub

    Thanks everybody for the answers anyway. :)


    • Marked as answer by hollow82 Saturday, December 22, 2012 3:43 PM
    • Edited by hollow82 Saturday, December 22, 2012 3:43 PM
    Saturday, December 22, 2012 3:42 PM