none
How to ignore DTD when loading XML document

    Question

  • Hi,

    I am trying to load an XMLdocument from a URL that returns XML with an embedded DTD directive that points to a folder on the external web site with a relative path.  The XMLDom thinks this is a local path and says it can't find the file.  I don't really care about the DTD and would happly ignore it if I could find a way to do this, or alternaitvely get the XMLDom to treat the dtd reference as a relative path on the server...I have also tried creating the xml reader with settings  and configuring it to ignore processing instructions, prohibit DTD and offset the line number passed the DTD dfierctive - in all cases it still failed on the DTD.  Any help will be much appreciated

     

    Thanks in advance,

    Brendon.

     

     

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE TRANSACTIONS SYSTEM "..\dtd\transactions.dtd">
    <TRANSACTIONS

    ...

    <\TRANSACTIONS>

     

    string urlPattern = ConfigurationManager.AppSettings["TradesURLpattern"];

    string urlString = string.Format(urlPattern, fund, day.ToString("MM/dd/yyyy"));

    //Set up the security credentials requieres to access the HTTP web site

    string urlUser = ConfigurationManager.AppSettings["TradesURLUser"];

    string urlPassword = ConfigurationManager.AppSettings["TradesURLPassword"];

    NetworkCredential credential = new NetworkCredential(urlUser, urlPassword);

    //Create xml stream reader from HTTP

    WebRequest req = WebRequest.Create(urlString);

    req.Credentials = credential;

    WebResponse resp = req.GetResponse();

    System.IO.StreamReader textReader = new System.IO.StreamReader(resp.GetResponseStream());

    XmlReaderSettings settings = new XmlReaderSettings();

    //settings.IgnoreProcessingInstructions = true;

    settings.ProhibitDtd = true;

    //settings.LineNumberOffset = 3;

    XmlReader xmlReader = XmlTextReader.Create(textReader,settings);

    XmlDocument xmlDoc = new XmlDocument();

    xmlDoc.Load(xmlReader);

    Monday, June 11, 2007 2:03 PM

Answers

  • Use XmlReaderSettings where you set the XmlResolver property to null (C#) respectively Nothing (VB).
    Monday, June 11, 2007 2:17 PM
  • Well so far you stated that your stylesheet does not give you the output you want, you did not state the error you now describe. It sounds as if the DTD defines the namespace declaration so you can't ignore it as otherwise the markup is not namespace well-formed.

    I am not sure how to solve that easily and cleanly, it looks as if the DTD is essential to the meaning of the document so any attempts to ignore the DTD will cause troubles.

    You could try to provide your own XmlResolver which then, instead of the original Sub.dtd loads a different DTD that only declares the namespace and maybe the entities you need for the document to be namespace well-formed but does not define the default attribute values you don't want.


    MVP Data Platform Development My blog
    Monday, June 14, 2010 10:35 AM
  • Hi,

    But there's no "other" way. Your document is missing the namespace declaration for prefix "x". Assuming your XML is indeed well formed, it must have that declaration in the DTD. So you need to either process the DTD the XML points to, or replace it with some other DTD which will declare the "x" prefix.

    Thanks,


    Vitek Karas [MSFT]
    Tuesday, June 15, 2010 7:25 PM

All replies

  • Use XmlReaderSettings where you set the XmlResolver property to null (C#) respectively Nothing (VB).
    Monday, June 11, 2007 2:17 PM
  • Thanks - that works.

     

    actually I ended up removing the settings / create altogether and setting XmlResolver = null in the XMLReader and document which also works and is a bit tidier.

     

    Cheers,

    Brendon.

     

     

    //Create xml stream reader from HTTP

    WebRequest req = WebRequest.Create(urlString);

    req.Credentials = credential;

    WebResponse resp = req.GetResponse();

    System.IO.StreamReader textReader = new System.IO.StreamReader(resp.GetResponseStream());

    XmlTextReader xmlReader = new XmlTextReader(textReader);

    xmlReader.XmlResolver = null;

    //extract and flatten data from the xml doc

    XmlDocument xmlDoc = new XmlDocument();

    xmlDoc.XmlResolver = null;

    xmlDoc.Load(xmlReader);

    Monday, June 11, 2007 2:38 PM
  • Hello,

     

    I have the same needing of ignoring dtd.

    When I try to validate my xml instance (the same instance which has been used to generate the xsd schema) I obtain this error. (I have traslated the error to English)

     

    Warning BEC2004: For security reasons DTD is prohibited in this XML document. To enable this DTD process, you must setup ProhibitedDtd property in XmlReaderSettings as false and pass it to XmlReader.Create method.

     

    Error BEC2004: the schema DO6713771.xsd could not been validated.

     

    I am quite new with BizTalk and .NET and have no idea of how (and where) I could change this property or how XMLReader settings can be changed.

     

    Please I will be very grateful if someone could help me.

     

    Thank you in advance.

     

    Naiara

     

    Wednesday, May 14, 2008 4:16 PM
  • I don't know BizTalk but in .NET 2.0 or later C# code you would use

    Code Snippet

    XmlReaderSettings settings = new XmlReaderSettings();

    settings.ProhibitDtd = false;

    using (XmlReader reader = XmlReader.Create(@"file.xml", settings))

    {

       // use reader here

    }

     

    If that does not help then try to find a BizTalk forum.

    Wednesday, May 14, 2008 4:34 PM
  • Thank you very much.

    I will try it.

     

    Thursday, May 15, 2008 7:45 AM
  •  

    Hei ,This code helped me to laod the SVG files without DTD validation. Thanks
    Saturday, August 02, 2008 11:22 AM
  • If you are using XmlDocument.Load to load the XML,

    XmlDocument

     

    doc= new XmlDocument();
    doc.XmlResolver =
    null;
    doc.Load("xyz.xml");

    Thanks

    Tuesday, April 07, 2009 8:49 PM
  • Hi,

    I have some input Xml files quite similar to what Brendon had mentioned in his initial posts. They have inline dtd reference in them pointing to the dtd files in one of the local drives.

    I wanted to take out Xpaths from those Xml files. So, i wrote a Stylesheet which would extract Xpaths out of the xml files.

    By using a small C# Console Application , am trying to transform the Xml files by applying Stylesheet on them and generate the Xpaths.

     

    I am successfully able to generate the Xpaths. However, not all Xpath are from xml files, some of them dont even exist in those files. As far as i think , the Application is generating the paths from the file as well as probable Xpaths from the DTD. Can anyone shed some light on this?

     

    Below is the code i used in my app:

    string foldername = args[0];
        string XslPath = args[1];
        DirectoryInfo obj_Directory = new DirectoryInfo(foldername);
        if (obj_Directory.Exists)
        {
          FileInfo obj_file = obj_Directory.GetFiles("*.xml");
          foreach (FileInfo obj_Fileinfo in obj_file)
          {
            string str_OutputText = foldername + "Output\\" + obj_Fileinfo.Name + ".txt";
            XslCompiledTransform xsl = new XslCompiledTransform();
            string xslStyleSheet = XslPath + @"\Xpath_StyleSheet";
            xsl.Load(xslStyleSheet);
            
            //Code to read XML file
            XmlReaderSettings rdSett = new XmlReaderSettings();
            rdSett.ProhibitDtd = false;
            rdSett.ValidationType = ValidationType.DTD;
            rdSett.CloseInput = true;
            
            //Code to write output to a text file 
            XmlReader reader = XmlReader.Create(obj_Fileinfo.FullName, rdSett);
            XmlWriterSettings rwSett = new XmlWriterSettings();
            rwSett.ConformanceLevel = ConformanceLevel.Auto;
            XmlWriter writer = XmlWriter.Create(str_OutputText, rwSett);
            xsl.Transform(reader, writer);
            reader.Close();
            writer.Close();
          }
        }

     

    Since i am not very sure of how the extra Xpaths are getting generated, i also tried to generate Xpath by ignoring the DTD.

     

    For this, i have changed the

    ProhibitDTD property =  true and set XmlResolver Property of XmlReaderSetting to null 

    rdSett.XmlResolver = null;

    But this also doesnt work. It gives the error "For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method."

     

    Any ideas on either way (processing DTD and getting Xpath only from Xml file / Ignoring DTD and getting Xpath from Xml file) will be helpful.

     

    Thanx,

    Sid

     

     

     

     

     

    Wednesday, June 09, 2010 6:36 PM
  • Please show a minimal but complete XML input document, XSLT stylesheet to demonstrate the problem and also show us the result you want to get from the XSLT stylesheet and the result you actually get.

    I currently don't understand what the problem is or how it is related to reading or ignoring the DTD.


    MVP Data Platform Development My blog
    Friday, June 11, 2010 11:10 AM
  • hi Martin,

    A sample Xml that i am using will be something like this: 

    <!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd">
    
    <x:document>
     <x:title>Subjects available in Mechanical Engineering.</x:title>
     <x:subjectID id = "ID">2.303
      <x:subjectname name="Name">Fluid Mechanics</x:subjectname>
     </x:subjectID>
    </x:document>

    The Stylesheet that am using extracts the simply Xpath from the Xml file.

     


    The result am getting in text file is:

    x:document/

    x:/document/x:title

    x:/document/x:subjectID

    x:/document/x:subjectID/@id

    x:/document/x:subjectID/@X

    x:/document/x:subjectID/x:subjectname

    x:/document/x:subjectID/x:subjectname/@name

    x:/document/x:subjectID/x:subjectname/@Y

     

    Now, if we notice the 2 xpaths in bold. They are not present in the input XML file, but the 2 are there as optional elements in the DTD. These 2 Xpaths are extra and are not required in the output. 

     

    So as far as i can think , the problem can be solved if i am able to ignore the DTD.

     

    Hope it made things a bit clear.

     

    Thanx,

    Sid

     

     



     

     

     

    Saturday, June 12, 2010 6:07 PM
  • So your DTD declares some default values for attributes, that is why you want to ignore the DTD.

    With .NET 4.0 there is a new setting DtdProcessing that allows that: Assuming a dtd1.dtd of e.g.

    <?xml version="1.0" encoding="utf-8" ?>
    <!ELEMENT foo EMPTY>
    <!ATTLIST foo
     att1 CDATA #IMPLIED
     att2 CDATA "default">
    

    and an XML sample document XMLFile1.xml as e.g.

    <?xml version="1.0" encoding="utf-8" ?>
    <!DOCTYPE foo SYSTEM "dtd1.dtd">
    <foo att1="value 1"/>
    

    the following stylesheet

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet
     version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
     <xsl:output method="text"/>
    
     <xsl:template match="/">
      <xsl:value-of select="concat('Number of attributes: ', count(//@*), '&#10;')"/>
     </xsl:template>
    </xsl:stylesheet>
    

    when applied with DtdProcessing.Ignore finds one attribute and it finds two attributes with DtdProcessing.Parse.

    Sample C#

          XmlReaderSettings xrs = new XmlReaderSettings()
          {
            DtdProcessing = DtdProcessing.Ignore
          };
    
          XslCompiledTransform proc = new XslCompiledTransform();
          proc.Load(@"..\..\XSLTFile1.xslt");
    
          using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs))
          {
            proc.Transform(xr, null, Console.Out);
          }
    
          xrs.DtdProcessing = DtdProcessing.Parse;
    
          using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs))
          {
            proc.Transform(xr, null, Console.Out);
          }

    outputs

    Number of attributes: 1
    Number of attributes: 2
    Does that help? Or which .NET version do you target?

     

     

     

     

     

     

     


    MVP Data Platform Development My blog
    Sunday, June 13, 2010 10:22 AM
  • If you use .NET 3.5 then you can't use the DtdProcessing setting but you can still set the XmlResolver to null to ensure no external resources like an external DTD file are loaded.

    So with .NET 3.5 and XML, DTD and XSLT as shown in my earlier post the following C# code has the same output:

          XmlReaderSettings xrs = new XmlReaderSettings()
          {
            ProhibitDtd = false,
            XmlResolver = null
          };
    
          XslCompiledTransform proc = new XslCompiledTransform();
          proc.Load(@"..\..\XSLTFile1.xslt");
    
          using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs))
          {
            proc.Transform(xr, null, Console.Out);
          }
    
          xrs.XmlResolver = new XmlUrlResolver();
    
          using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs))
          {
            proc.Transform(xr, null, Console.Out);
          }
    meaning when the XmlResolver is set to null the attribute with the default value specified in the external DTD is not present in the data model the XSLT stylesheet operates on.

     


    MVP Data Platform Development My blog
    Sunday, June 13, 2010 11:43 AM
  • Hi Martin,

    As i had mentioned earlier, my XML document is something like this:

    <!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd">
    
    <x:document>
     <x:title>Subjects available in Mechanical Engineering.</x:title>
     <x:subjectID id = "ID">2.303
     <x:subjectname name="Name">Fluid Mechanics</x:subjectname>
     </x:subjectID>
    </x:document>
    

    I had already tried the solution that you had mentioned above with .Net 3.5. The error i was getting was

    'x' is an undeclared namespace

    And even with the code you have mentioned, it gives the same error.

    Can you suggest some other approach?

     

    Thanx,

    Sid

    Monday, June 14, 2010 3:41 AM
  • Well so far you stated that your stylesheet does not give you the output you want, you did not state the error you now describe. It sounds as if the DTD defines the namespace declaration so you can't ignore it as otherwise the markup is not namespace well-formed.

    I am not sure how to solve that easily and cleanly, it looks as if the DTD is essential to the meaning of the document so any attempts to ignore the DTD will cause troubles.

    You could try to provide your own XmlResolver which then, instead of the original Sub.dtd loads a different DTD that only declares the namespace and maybe the entities you need for the document to be namespace well-formed but does not define the default attribute values you don't want.


    MVP Data Platform Development My blog
    Monday, June 14, 2010 10:35 AM
  • Hi Martin,

     

    I had not stated the error because i knew i would not be able to process the xml file without processing the DTD. So, i was mainly trying to get a way for both processing the DTD and ignore the default attribute values at the same time.

     

    However, using a different DTD does not seem a feasiable solution.

     

    I am still looking for another way.

    Anyways, thanx for your suggestions.

     

    Regards,

    Sid

     

     

    Tuesday, June 15, 2010 7:11 PM
  • Hi,

    But there's no "other" way. Your document is missing the namespace declaration for prefix "x". Assuming your XML is indeed well formed, it must have that declaration in the DTD. So you need to either process the DTD the XML points to, or replace it with some other DTD which will declare the "x" prefix.

    Thanks,


    Vitek Karas [MSFT]
    Tuesday, June 15, 2010 7:25 PM
  • Hi Vitek,

    looks like there is no other way because DTD cannot be replaced. So, i need to process the DTD.

    But my problem really occurs while i transform the xml file with the stylesheet using C# Console application.

    If you have read my earlier posts, there are some extra xpaths coming in my output when i use the xsl.Transform() function.

    But when i do a tranformation using some Tool like Altova XML Spy or StylusStudio, no such extra Xpaths are there in output.

    Any ideas on this?

    Thanx,

    Sid

    Tuesday, June 29, 2010 4:48 AM
  • Hi Sid,

    Could you please show us an example of the input, the XSLT you use and the "Extra xpaths" that occur in the output? I must admit I don't know what you mean by "extra xpath", I assume you mean some characters you didn't expect in the output, right?

    Without a sample repro I can't thin of anything common/obvious which would cause such behavior.

    Thanks,


    Vitek Karas [MSFT]
    Tuesday, June 29, 2010 8:26 AM
  • Hi Vitek,

    I have shown an example of my input xml file and the kind of problem i am facing in the output. 

    The same example am putting down here again for you.

    The Xml file looks something like below xml file:
    <!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd">
    
    <x:document>
     <x:title>Subjects available in Mechanical Engineering.</x:title>
     <x:subjectID id = "ID">2.303
     <x:subjectname name="Name">Fluid Mechanics</x:subjectname>
     </x:subjectID>
    </x:document>
    

    The Stylesheet that am using extracts the simply Xpath from the Xml file.

     The result am getting in output text file when i apply the stylesheet on the xml file is something like this:

    x:document/

    x:/document/x:title

    x:/document/x:subjectID

     

    x:/document/x:subjectID/@id

     

    x:/document/x:subjectID/@X

    x:/document/x:subjectID/x:subjectname

    x:/document/x:subjectID/x:subjectname/@name

    x:/document/x:subjectID/x:subjectname/@Y

     

    If we notice the 2 xpaths in bold. They are not present in the input XML file, but the 2 are there as optional elements in the DTD. These 2 Xpaths are extra and are not required in the output. 

    Now, am aware of the fact that since there is no namespace declaration in the Xml file, Xml document can only be well-formed if i have a refernece to the external DTD.

    But when i process the file with DTD , my output has those extra Xpaths as shown above.

    Also, it is not possible for me to modify the xml file in any way.

    Hope it makes the situation a bit clear.

    Thanx,

    Sid

    Wednesday, June 30, 2010 2:05 PM
  • Set XmlReaderSettings.DtdProcessing to Ignore.

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.DtdProcessing = DtdProcessing.Ignore;
    XmlReader reader = XmlReader.Create(@"my.xml", settings);
    XmlDocument document = new XmlDocument();
    document.Load(reader);


    Thursday, April 17, 2014 5:49 PM