How to ignore DTD when loading XML document
-
Monday, June 11, 2007 2:03 PM
Hi,
I am trying to load an XMLdocument from a URL that returns XML with an embedded DTD directive that points to a folder on the external web site with a relative path. The XMLDom thinks this is a local path and says it can't find the file. I don't really care about the DTD and would happly ignore it if I could find a way to do this, or alternaitvely get the XMLDom to treat the dtd reference as a relative path on the server...I have also tried creating the xml reader with settings and configuring it to ignore processing instructions, prohibit DTD and offset the line number passed the DTD dfierctive - in all cases it still failed on the DTD. Any help will be much appreciated
Thanks in advance,
Brendon.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TRANSACTIONS SYSTEM "..\dtd\transactions.dtd">
<TRANSACTIONS...
<\TRANSACTIONS>
string
urlPattern = ConfigurationManager.AppSettings["TradesURLpattern"]; string urlString = string.Format(urlPattern, fund, day.ToString("MM/dd/yyyy")); //Set up the security credentials requieres to access the HTTP web site string urlUser = ConfigurationManager.AppSettings["TradesURLUser"]; string urlPassword = ConfigurationManager.AppSettings["TradesURLPassword"]; NetworkCredential credential = new NetworkCredential(urlUser, urlPassword); //Create xml stream reader from HTTP WebRequest req = WebRequest.Create(urlString);req.Credentials = credential;
WebResponse resp = req.GetResponse();System.IO.
StreamReader textReader = new System.IO.StreamReader(resp.GetResponseStream()); XmlReaderSettings settings = new XmlReaderSettings(); //settings.IgnoreProcessingInstructions = true;settings.ProhibitDtd =
true;//settings.LineNumberOffset = 3;
XmlReader xmlReader = XmlTextReader.Create(textReader,settings); XmlDocument xmlDoc = new XmlDocument();xmlDoc.Load(xmlReader);
All Replies
-
Monday, June 11, 2007 2:17 PM
Use XmlReaderSettings where you set the XmlResolver property to null (C#) respectively Nothing (VB). -
Monday, June 11, 2007 2:38 PM
Thanks - that works.
actually I ended up removing the settings / create altogether and setting XmlResolver = null in the XMLReader and document which also works and is a bit tidier.
Cheers,
Brendon.
//Create xml stream reader from HTTP
WebRequest req = WebRequest.Create(urlString);req.Credentials = credential;
WebResponse resp = req.GetResponse();System.IO.
StreamReader textReader = new System.IO.StreamReader(resp.GetResponseStream()); XmlTextReader xmlReader = new XmlTextReader(textReader);xmlReader.XmlResolver =
null; //extract and flatten data from the xml doc XmlDocument xmlDoc = new XmlDocument();xmlDoc.XmlResolver =
null;xmlDoc.Load(xmlReader);
-
Wednesday, May 14, 2008 4:16 PM
Hello,
I have the same needing of ignoring dtd.
When I try to validate my xml instance (the same instance which has been used to generate the xsd schema) I obtain this error. (I have traslated the error to English)
Warning BEC2004: For security reasons DTD is prohibited in this XML document. To enable this DTD process, you must setup ProhibitedDtd property in XmlReaderSettings as false and pass it to XmlReader.Create method.
Error BEC2004: the schema DO6713771.xsd could not been validated.
I am quite new with BizTalk and .NET and have no idea of how (and where) I could change this property or how XMLReader settings can be changed.
Please I will be very grateful if someone could help me.
Thank you in advance.
Naiara
-
Wednesday, May 14, 2008 4:34 PM
I don't know BizTalk but in .NET 2.0 or later C# code you would use
Code SnippetXmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
using (XmlReader reader = XmlReader.Create(@"file.xml", settings))
{
// use reader here
}
If that does not help then try to find a BizTalk forum.
-
Thursday, May 15, 2008 7:45 AM
Thank you very much.
I will try it.
-
Saturday, August 02, 2008 11:22 AM
Hei ,This code helped me to laod the SVG files without DTD validation. Thanks -
Tuesday, April 07, 2009 8:49 PMIf you are using XmlDocument.Load to load the XML,
XmlDocument
doc= new XmlDocument();
doc.XmlResolver = null;
doc.Load("xyz.xml");
Thanks -
Wednesday, June 09, 2010 6:36 PM
Hi,
I have some input Xml files quite similar to what Brendon had mentioned in his initial posts. They have inline dtd reference in them pointing to the dtd files in one of the local drives.
I wanted to take out Xpaths from those Xml files. So, i wrote a Stylesheet which would extract Xpaths out of the xml files.
By using a small C# Console Application , am trying to transform the Xml files by applying Stylesheet on them and generate the Xpaths.
I am successfully able to generate the Xpaths. However, not all Xpath are from xml files, some of them dont even exist in those files. As far as i think , the Application is generating the paths from the file as well as probable Xpaths from the DTD. Can anyone shed some light on this?
Below is the code i used in my app:
string foldername = args[0]; string XslPath = args[1]; DirectoryInfo obj_Directory = new DirectoryInfo(foldername); if (obj_Directory.Exists) { FileInfo obj_file = obj_Directory.GetFiles("*.xml"); foreach (FileInfo obj_Fileinfo in obj_file) { string str_OutputText = foldername + "Output\\" + obj_Fileinfo.Name + ".txt"; XslCompiledTransform xsl = new XslCompiledTransform(); string xslStyleSheet = XslPath + @"\Xpath_StyleSheet"; xsl.Load(xslStyleSheet); //Code to read XML file XmlReaderSettings rdSett = new XmlReaderSettings(); rdSett.ProhibitDtd = false; rdSett.ValidationType = ValidationType.DTD; rdSett.CloseInput = true; //Code to write output to a text file XmlReader reader = XmlReader.Create(obj_Fileinfo.FullName, rdSett); XmlWriterSettings rwSett = new XmlWriterSettings(); rwSett.ConformanceLevel = ConformanceLevel.Auto; XmlWriter writer = XmlWriter.Create(str_OutputText, rwSett); xsl.Transform(reader, writer); reader.Close(); writer.Close(); } }Since i am not very sure of how the extra Xpaths are getting generated, i also tried to generate Xpath by ignoring the DTD.
For this, i have changed the
ProhibitDTD property = true and set XmlResolver Property of XmlReaderSetting to null
rdSett.XmlResolver = null;
But this also doesnt work. It gives the error "For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method."
Any ideas on either way (processing DTD and getting Xpath only from Xml file / Ignoring DTD and getting Xpath from Xml file) will be helpful.
Thanx,
Sid
-
Friday, June 11, 2010 11:10 AM
Please show a minimal but complete XML input document, XSLT stylesheet to demonstrate the problem and also show us the result you want to get from the XSLT stylesheet and the result you actually get.
I currently don't understand what the problem is or how it is related to reading or ignoring the DTD.
MVP Data Platform Development My blog -
Saturday, June 12, 2010 6:07 PM
hi Martin,
A sample Xml that i am using will be something like this:
<!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd"> <x:document> <x:title>Subjects available in Mechanical Engineering.</x:title> <x:subjectID id = "ID">2.303 <x:subjectname name="Name">Fluid Mechanics</x:subjectname> </x:subjectID> </x:document>
The Stylesheet that am using extracts the simply Xpath from the Xml file.
The result am getting in text file is:x:document/
x:/document/x:title
x:/document/x:subjectID
x:/document/x:subjectID/@id
x:/document/x:subjectID/@X
x:/document/x:subjectID/x:subjectname
x:/document/x:subjectID/x:subjectname/@name
x:/document/x:subjectID/x:subjectname/@Y
Now, if we notice the 2 xpaths in bold. They are not present in the input XML file, but the 2 are there as optional elements in the DTD. These 2 Xpaths are extra and are not required in the output.
So as far as i can think , the problem can be solved if i am able to ignore the DTD.
Hope it made things a bit clear.
Thanx,
Sid
-
Sunday, June 13, 2010 10:22 AM
So your DTD declares some default values for attributes, that is why you want to ignore the DTD.
With .NET 4.0 there is a new setting DtdProcessing that allows that: Assuming a dtd1.dtd of e.g.
<?xml version="1.0" encoding="utf-8" ?> <!ELEMENT foo EMPTY> <!ATTLIST foo att1 CDATA #IMPLIED att2 CDATA "default">
and an XML sample document XMLFile1.xml as e.g.
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE foo SYSTEM "dtd1.dtd"> <foo att1="value 1"/>
the following stylesheet
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:value-of select="concat('Number of attributes: ', count(//@*), ' ')"/> </xsl:template> </xsl:stylesheet>when applied with DtdProcessing.Ignore finds one attribute and it finds two attributes with DtdProcessing.Parse.
Sample C#
XmlReaderSettings xrs = new XmlReaderSettings() { DtdProcessing = DtdProcessing.Ignore }; XslCompiledTransform proc = new XslCompiledTransform(); proc.Load(@"..\..\XSLTFile1.xslt"); using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs)) { proc.Transform(xr, null, Console.Out); } xrs.DtdProcessing = DtdProcessing.Parse; using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs)) { proc.Transform(xr, null, Console.Out); }outputs
Number of attributes: 1 Number of attributes: 2
Does that help? Or which .NET version do you target?
MVP Data Platform Development My blog -
Sunday, June 13, 2010 11:43 AM
If you use .NET 3.5 then you can't use the DtdProcessing setting but you can still set the XmlResolver to null to ensure no external resources like an external DTD file are loaded.
So with .NET 3.5 and XML, DTD and XSLT as shown in my earlier post the following C# code has the same output:
XmlReaderSettings xrs = new XmlReaderSettings() { ProhibitDtd = false, XmlResolver = null }; XslCompiledTransform proc = new XslCompiledTransform(); proc.Load(@"..\..\XSLTFile1.xslt"); using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs)) { proc.Transform(xr, null, Console.Out); } xrs.XmlResolver = new XmlUrlResolver(); using (XmlReader xr = XmlReader.Create(@"..\..\XMLFile1.xml", xrs)) { proc.Transform(xr, null, Console.Out); }meaning when the XmlResolver is set to null the attribute with the default value specified in the external DTD is not present in the data model the XSLT stylesheet operates on.
MVP Data Platform Development My blog -
Monday, June 14, 2010 3:41 AM
Hi Martin,
As i had mentioned earlier, my XML document is something like this:
<!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd"> <x:document> <x:title>Subjects available in Mechanical Engineering.</x:title> <x:subjectID id = "ID">2.303 <x:subjectname name="Name">Fluid Mechanics</x:subjectname> </x:subjectID> </x:document>
I had already tried the solution that you had mentioned above with .Net 3.5. The error i was getting was
'x' is an undeclared namespace
And even with the code you have mentioned, it gives the same error.
Can you suggest some other approach?
Thanx,
Sid
-
Monday, June 14, 2010 10:35 AM
Well so far you stated that your stylesheet does not give you the output you want, you did not state the error you now describe. It sounds as if the DTD defines the namespace declaration so you can't ignore it as otherwise the markup is not namespace well-formed.
I am not sure how to solve that easily and cleanly, it looks as if the DTD is essential to the meaning of the document so any attempts to ignore the DTD will cause troubles.
You could try to provide your own XmlResolver which then, instead of the original Sub.dtd loads a different DTD that only declares the namespace and maybe the entities you need for the document to be namespace well-formed but does not define the default attribute values you don't want.
MVP Data Platform Development My blog- Marked As Answer by Vitek Karas - MSFTMicrosoft Employee, Moderator Thursday, June 24, 2010 7:50 AM
-
Tuesday, June 15, 2010 7:11 PM
Hi Martin,
I had not stated the error because i knew i would not be able to process the xml file without processing the DTD. So, i was mainly trying to get a way for both processing the DTD and ignore the default attribute values at the same time.
However, using a different DTD does not seem a feasiable solution.
I am still looking for another way.
Anyways, thanx for your suggestions.
Regards,
Sid
-
Tuesday, June 15, 2010 7:25 PMModerator
Hi,
But there's no "other" way. Your document is missing the namespace declaration for prefix "x". Assuming your XML is indeed well formed, it must have that declaration in the DTD. So you need to either process the DTD the XML points to, or replace it with some other DTD which will declare the "x" prefix.
Thanks,
Vitek Karas [MSFT]- Marked As Answer by Vitek Karas - MSFTMicrosoft Employee, Moderator Thursday, June 24, 2010 7:50 AM
-
Tuesday, June 29, 2010 4:48 AM
Hi Vitek,
looks like there is no other way because DTD cannot be replaced. So, i need to process the DTD.
But my problem really occurs while i transform the xml file with the stylesheet using C# Console application.
If you have read my earlier posts, there are some extra xpaths coming in my output when i use the xsl.Transform() function.
But when i do a tranformation using some Tool like Altova XML Spy or StylusStudio, no such extra Xpaths are there in output.
Any ideas on this?
Thanx,
Sid
-
Tuesday, June 29, 2010 8:26 AMModerator
Hi Sid,
Could you please show us an example of the input, the XSLT you use and the "Extra xpaths" that occur in the output? I must admit I don't know what you mean by "extra xpath", I assume you mean some characters you didn't expect in the output, right?
Without a sample repro I can't thin of anything common/obvious which would cause such behavior.
Thanks,
Vitek Karas [MSFT] -
Wednesday, June 30, 2010 2:05 PM
Hi Vitek,
I have shown an example of my input xml file and the kind of problem i am facing in the output.
The same example am putting down here again for you.
The Xml file looks something like below xml file:
<!DOCTYPE x:document SYSTEM "D:\DTD\Sub.dtd"> <x:document> <x:title>Subjects available in Mechanical Engineering.</x:title> <x:subjectID id = "ID">2.303 <x:subjectname name="Name">Fluid Mechanics</x:subjectname> </x:subjectID> </x:document>
The Stylesheet that am using extracts the simply Xpath from the Xml file.
The result am getting in output text file when i apply the stylesheet on the xml file is something like this:
x:document/
x:/document/x:title
x:/document/x:subjectID
x:/document/x:subjectID/@id
x:/document/x:subjectID/@X
x:/document/x:subjectID/x:subjectname
x:/document/x:subjectID/x:subjectname/@name
x:/document/x:subjectID/x:subjectname/@Y
If we notice the 2 xpaths in bold. They are not present in the input XML file, but the 2 are there as optional elements in the DTD. These 2 Xpaths are extra and are not required in the output.
Now, am aware of the fact that since there is no namespace declaration in the Xml file, Xml document can only be well-formed if i have a refernece to the external DTD.
But when i process the file with DTD , my output has those extra Xpaths as shown above.
Also, it is not possible for me to modify the xml file in any way.
Hope it makes the situation a bit clear.
Thanx,
Sid

