none
How to avoid XML_ENTITY_UNDEFINED when using XmlDocument load this piece of Xml?

    Question

  • <?xml version="1.0"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <package >
    <rights>Copyright &copy; 1999 </rights>
    </package >

    I tried different combination like below, and it doesn't help? Any idea to parse this successfully?

    BTW, you can load it successfully in browser, of course it will fail to load if you remove DTD definition.

                XmlLoadSettings settings = new XmlLoadSettings()

                {
                    ValidateOnParse = true,
                    ProhibitDtd = false,
                    ResolveExternals  = false
                };
                xmldocument.LoadXml(xmlString, settings);


    Friday, October 18, 2013 11:34 PM

Answers

All replies

  • Hello,

    Welcome to this forum.

    As far as I know, &copy; is a XML escape character, so XMLTextReader will un-escape this.

    If we change this:

      <rights>Copyright &copy; 1999 </rights>

    To this:

      <rights>Copyright &amp;copy; 1999 </rights>
    It will be ok.

    So we need to replace the &copy;  to be &amp;copy; in program and then we can load it using XmlDocument like  below:

    class Program
    
        {
    
            static void Main(string[] args)
    
            {
    
                string path = @"E:\BMX\Lab\SmapleFile\2013-10\Smaple_21.xml";
    
    
                string text = System.IO.File.ReadAllText(path).Replace("&copy;", "&amp;copy;");
    
    
                XmlDocument xDocument = new XmlDocument();
    
    
                xDocument.LoadXml(text);
    
            }
    
        }
    

    The result:

    <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><package><rights>Copyright &amp;copy; 1999 </rights></package>

    Regards.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, October 21, 2013 7:42 AM
  • The problem is that XML is coming from external source (and I can't do a string grep to replace all &xxx;), but I wonder whether the xmldocument can still parse it.

    BTW, you can load it successfully in browser without changing it,  it will fail to load if you remove DTD definition.

    I like to know how to load it without changing XML file.

    Thanks!

    Monday, October 21, 2013 4:22 PM
  • >>and I can't do a string grep to replace all &xxx;),

    string text = System.IO.File.ReadAllText(path).Replace("&copy;", "&amp;copy;");
    

    It will only replace the "&copy" to &amp;copy:".

    Or you can use a regex to set the char which needs need to replace and for this, please have a look at the link below:

    http://stackoverflow.com/questions/121511/reading-xml-with-an-into-c-sharp-xmldocument-object/121544#121544

    Regards.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Tuesday, October 22, 2013 7:37 AM