locked
Parsing a XML document replaces special characters with white space RRS feed

  • Question

  • When I try to parse a XML document that contains special characters like this &#x9,it is getting replaced by white space.I want to restrict this behaviour.How can i achieve this?

    xmlDoc = New XmlDocument
     xmlDoc.XmlResolver = New clsCustomResolver
      xmlDoc.Load(URL)

    XML document

    <elementname att1="&#x9"/>


    Thursday, October 8, 2009 5:47 AM

Answers

  • Try to save your document to an XmlWriter e.g.

    Dim xws As New XmlWriterSettings()
    xws.Indent = True
    Using xw As XmlWriter = XmlWriter.Create("output.xml", xws)
      odoc.Save(xw)
    End Using
    that way I think the XmlWriter takes care of escaping any tab character in the attribute value as &#x9;
    MVP XML My blog
    • Proposed as answer by Martin Honnen Tuesday, October 13, 2009 5:40 PM
    • Marked as answer by Yichun_Feng Wednesday, October 14, 2009 6:36 AM
    Tuesday, October 13, 2009 5:33 PM
  • Thanks..XMLWriter works for me.
    • Marked as answer by raji.esha Wednesday, October 14, 2009 11:41 AM
    Wednesday, October 14, 2009 11:41 AM

All replies

  • Hi,

    This behavior is mandated by the XML specification. XML parser must report string values without character entity references. The &#x9; is a character reference to a character with Unicode code point 9. That is the TAB character. So parser converts that into a single TAB character. So the value of the attribute "att1" should be a single TAB character.
    Note that when you save such document back to an XML text file, the TAB character will be converted back to the &#x9;, since the Save method knows that parser will replace it back with the TAB (it can't write the TAB because parser would normalize that to a single space, also mandated by the XML spec).

    So to answer your question, you can't change that behavior. On the other hand you should not need to change it. If you think that you still need to, please let us know what you need the &#x9; value for. (Note that any other XML processing tool/application will also see the TAB character and not the character reference, as the first thing it will do is to parse the document with an XML parser).

    Thanks,
    Vitek Karas [MSFT]
    Thursday, October 8, 2009 10:45 AM
    Moderator
  • I can't reproduce the problem, the following application, run with .NET framework 3.5:

                string xml = @"<foo bar=""&#x9;""/>";
    
                XmlDocument doc = new XmlDocument();
    
                doc.LoadXml(xml);
                Console.WriteLine("character code: {0}.", (int)doc.DocumentElement.GetAttribute("bar")[0]);
    
                doc.Load(@"..\..\XMLFile1.xml");
                Console.WriteLine("character code: {0}.", (int)doc.DocumentElement.GetAttribute("bar")[0]);
    
                doc.Load(XmlReader.Create(@"..\..\XMLFile1.xml"));
                Console.WriteLine("character code: {0}.", (int)doc.DocumentElement.GetAttribute("bar")[0]);
    where the file looks as follows:

    <?xml version="1.0" encoding="utf-8" ?>
    <foo bar="&#x9;"/>
    outputs

    character code: 9.
    character code: 9.
    character code: 9.

    which shows the tab character reference has been correctly parsed into the character with Unicode 9.

    So please provide evidence of the problem you face.
    MVP XML My blog
    Thursday, October 8, 2009 10:55 AM
  • When writing a sample program ,I came up with the following info:
    When &#x9; is present,while reading and writing to a XML file,&#x9; is not written.But a tab (suppose if  tab contains 4white spaces,4 white spaces are inserted).I donot want this to happen.When writing to a XML file also,&#x9;should be written.

     Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim xmlDoc, odoc As XmlDocument
            Dim strval As String
            Dim oelem, oroot As XmlElement
            Dim oattr As XmlAttribute
            Dim onode As XmlNode
            xmlDoc = New XmlDocument()
            xmlDoc.load("C:\\Rajeshwari\\sample.xml")
            strval = xmlDoc.DocumentElement.GetAttribute("VALUE")
            odoc = New XmlDocument()
            oelem = odoc.CreateElement("ELEM")
            oattr = odoc.CreateAttribute("VALUE")
            oattr.Value = strval
            oelem.SetAttributeNode(oattr)
            odoc.AppendChild(oelem)
            odoc.Save("C:\\Rajeshwari\\output.xml")
        End Sub

    SAMPLE.XML
    <ELEM VALUE="&#x9;Raji"/>

    OUTPUT.XML
    <ELEM VALUE="    Raji" />
    Tuesday, October 13, 2009 1:48 PM
  • Try to save your document to an XmlWriter e.g.

    Dim xws As New XmlWriterSettings()
    xws.Indent = True
    Using xw As XmlWriter = XmlWriter.Create("output.xml", xws)
      odoc.Save(xw)
    End Using
    that way I think the XmlWriter takes care of escaping any tab character in the attribute value as &#x9;
    MVP XML My blog
    • Proposed as answer by Martin Honnen Tuesday, October 13, 2009 5:40 PM
    • Marked as answer by Yichun_Feng Wednesday, October 14, 2009 6:36 AM
    Tuesday, October 13, 2009 5:33 PM
  • Thanks..XMLWriter works for me.
    • Marked as answer by raji.esha Wednesday, October 14, 2009 11:41 AM
    Wednesday, October 14, 2009 11:41 AM