none
LINQ to XML API and Schema validation

    Question

  • Hi!

    I'm trying  to validate an XML file against a schema. The root element of the file looks like
    <?xml version="1.0" encoding="utf-8" ?> 
    <Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./mySchema.xsd"
      <!-- child elements follow here --> 
    </Root> 
     

    Using the LINQ to XML API from C sharp I try:
            private XmlSchemaSet ocdSchema = new XmlSchemaSet(); 
            private XDocument myDoc = XDocument.Load("Path to instance");  
      
            //... 
            this.ocdSchema.Add(null, "Path to Schema"); //null is the targetNameSpace 
            this.myDoc.Validate(this.ocdSchema, new ValidationEventHandler(myValidationEventHandler)); 
     

    This "works" but I have 3 questions about it:
    • Why do I have add the Schema via the Add() method? I'd  like it to be inferred from the noNamespaceSchemaLocation attribute...
    • How do I get the line number of a validation event in the ValidationEventHandler?
    • I define some optional attributes with default values in the schema. After I validate an instance of the schema and access such an attribute via "element.Attribute("myOptionalAttribute).Value" I get a NullReferenceException thrown, if the attribute is not present in the instance. Is there a way to get the default value instead of an exception in such a case?
    Cheers & thanks for any tipps
    berntie


    Monday, December 15, 2008 3:27 PM

Answers

  • If all you want to do is validate an XML document against its schema specified in the xsi: noNamespaceSchemaLocation attribute then you don't have to use LINQ to XML at all, in that case an XmlReader created with the proper XmlReaderSettings suffices e.g.
                XmlReaderSettings settings = new XmlReaderSettings();  
                settings.ValidationType = ValidationType.Schema;  
                settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings | XmlSchemaValidationFlags.ProcessSchemaLocation;  
                settings.ValidationEventHandler += delegate(object sender, ValidationEventArgs vargs)  
                {  
                    Console.WriteLine("{0}: {1} Line: {2}", vargs.Severity, vargs.Message, vargs.Exception.LineNumber);  
                };  
                using (XmlReader reader = XmlReader.Create(@"..\..\XMLFile1.xml", settings))  
                {  
                    while (reader.Read()) {}  
                } 

    Does that help? Or do you need to use LINQ to XML and do the validation against the LINQ to XML XDocument tree?
    MVP XML
    Monday, December 15, 2008 4:26 PM
  • berntie said:

    Thanks for your tips Martin,

    I don't have to use LINQ, but I'd like to---if possible. So, if there's a reasonable way to do what I want with LINQ, I'd stick with it.

    Also, regarding my first two questions, I'm really curious why I have to use the Add() method and how one can obtain line numbers. :-)

    Greetings
    b.

    Well noNamespaceSchemaLocation (respectively schemaLocation) is only a hint. With DTDs you were supposed to validate against the DTD declared in the XML document but with schemas you are no longer bound to any schema(s) given in the document, instead you can validate against your own (trusted) schemas. So generally APIs allow you to choose the schema(s) you want to validate against. As for the LINQ to XML API and its Validate method, I don't know of a way to use it without explicitly passing in an XmlSchemaSet to which you add schemas, I don't think there is a method or setting to have it automatically use schema(s) named in noNamespaceSchemaLocation/schemaLocation attributes. On the other hand once you have a LINQ to XML tree you can read out those attributes and use the values found to add schemas to your schema set.

    As for line numbers, if you want the LINQ to XML object model to store line numbers you need to use a special overload of the Load method and set a flag to do that: http://msdn.microsoft.com/en-us/library/bb538371.aspx

    Here is an example doing that:

                XNamespace xsiNs = "http://www.w3.org/2001/XMLSchema-instance";  
     
                XDocument doc = XDocument.Load(@"XMLFile1.xml", LoadOptions.SetLineInfo);  
     
                XmlSchemaSet schemaSet = new XmlSchemaSet();  
                if (doc.Root.Attribute(xsiNs + "noNamespaceSchemaLocation") != null)  
                {  
                    schemaSet.Add(null, doc.Root.Attribute(xsiNs + "noNamespaceSchemaLocation").Value);  
                }  
     
                doc.Validate(schemaSet, delegate(object sender, ValidationEventArgs vargs)  
                {  
                    IXmlLineInfo lineInfo = sender as IXmlLineInfo;  
                    Console.WriteLine("{0}: {1}; Line: {2}", vargs.Severity, vargs.Message, lineInfo.LineNumber);  
                }, true); 

    If the XML is as follows:

    1 <?xml version="1.0" encoding="utf-8" ?> 
    2 <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="XMLSchema1.xsd">  
    3   <foo>1</foo> 
    4   <foo>a</foo> 
    5 </root> 

    and the schema as follows:

    <?xml version="1.0" encoding="utf-8"?>  
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">  
      <xs:element name="root">  
        <xs:complexType> 
          <xs:sequence maxOccurs="unbounded">  
            <xs:element name="foo" type="xs:int"/>  
          </xs:sequence> 
        </xs:complexType> 
      </xs:element> 
    </xs:schema> 

    then the output is 

    Error: The 'foo' element is invalid - The value 'a' is invalid according to its
    datatype 'http://www.w3.org/2001/XMLSchema:int' - The string 'a' is not a valid
    Int32 value.; Line: 4

    which has the correct line number I think for the error.




    MVP XML
    Tuesday, December 16, 2008 2:08 PM

All replies

  • If all you want to do is validate an XML document against its schema specified in the xsi: noNamespaceSchemaLocation attribute then you don't have to use LINQ to XML at all, in that case an XmlReader created with the proper XmlReaderSettings suffices e.g.
                XmlReaderSettings settings = new XmlReaderSettings();  
                settings.ValidationType = ValidationType.Schema;  
                settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings | XmlSchemaValidationFlags.ProcessSchemaLocation;  
                settings.ValidationEventHandler += delegate(object sender, ValidationEventArgs vargs)  
                {  
                    Console.WriteLine("{0}: {1} Line: {2}", vargs.Severity, vargs.Message, vargs.Exception.LineNumber);  
                };  
                using (XmlReader reader = XmlReader.Create(@"..\..\XMLFile1.xml", settings))  
                {  
                    while (reader.Read()) {}  
                } 

    Does that help? Or do you need to use LINQ to XML and do the validation against the LINQ to XML XDocument tree?
    MVP XML
    Monday, December 15, 2008 4:26 PM
  • As for the default attributes, if you want that with LINQ to XML then I think you need to use this overload of the Validate method http://msdn.microsoft.com/en-us/library/bb354954.aspx and pass true as the third argument.
    MVP XML
    Monday, December 15, 2008 4:42 PM
  • Thanks for your tips Martin,

    I don't have to use LINQ, but I'd like to---if possible. So, if there's a reasonable way to do what I want with LINQ, I'd stick with it.

    Also, regarding my first two questions, I'm really curious why I have to use the Add() method and how one can obtain line numbers. :-)

    Regarding question #3: Is there anything else I have to do? Call a method or anything like that? Or is it sufficient just to pass "true" as parameter?

    Greetings
    b.
    Monday, December 15, 2008 9:53 PM
  • berntie said:

    Thanks for your tips Martin,

    I don't have to use LINQ, but I'd like to---if possible. So, if there's a reasonable way to do what I want with LINQ, I'd stick with it.

    Also, regarding my first two questions, I'm really curious why I have to use the Add() method and how one can obtain line numbers. :-)

    Greetings
    b.

    Well noNamespaceSchemaLocation (respectively schemaLocation) is only a hint. With DTDs you were supposed to validate against the DTD declared in the XML document but with schemas you are no longer bound to any schema(s) given in the document, instead you can validate against your own (trusted) schemas. So generally APIs allow you to choose the schema(s) you want to validate against. As for the LINQ to XML API and its Validate method, I don't know of a way to use it without explicitly passing in an XmlSchemaSet to which you add schemas, I don't think there is a method or setting to have it automatically use schema(s) named in noNamespaceSchemaLocation/schemaLocation attributes. On the other hand once you have a LINQ to XML tree you can read out those attributes and use the values found to add schemas to your schema set.

    As for line numbers, if you want the LINQ to XML object model to store line numbers you need to use a special overload of the Load method and set a flag to do that: http://msdn.microsoft.com/en-us/library/bb538371.aspx

    Here is an example doing that:

                XNamespace xsiNs = "http://www.w3.org/2001/XMLSchema-instance";  
     
                XDocument doc = XDocument.Load(@"XMLFile1.xml", LoadOptions.SetLineInfo);  
     
                XmlSchemaSet schemaSet = new XmlSchemaSet();  
                if (doc.Root.Attribute(xsiNs + "noNamespaceSchemaLocation") != null)  
                {  
                    schemaSet.Add(null, doc.Root.Attribute(xsiNs + "noNamespaceSchemaLocation").Value);  
                }  
     
                doc.Validate(schemaSet, delegate(object sender, ValidationEventArgs vargs)  
                {  
                    IXmlLineInfo lineInfo = sender as IXmlLineInfo;  
                    Console.WriteLine("{0}: {1}; Line: {2}", vargs.Severity, vargs.Message, lineInfo.LineNumber);  
                }, true); 

    If the XML is as follows:

    1 <?xml version="1.0" encoding="utf-8" ?> 
    2 <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="XMLSchema1.xsd">  
    3   <foo>1</foo> 
    4   <foo>a</foo> 
    5 </root> 

    and the schema as follows:

    <?xml version="1.0" encoding="utf-8"?>  
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">  
      <xs:element name="root">  
        <xs:complexType> 
          <xs:sequence maxOccurs="unbounded">  
            <xs:element name="foo" type="xs:int"/>  
          </xs:sequence> 
        </xs:complexType> 
      </xs:element> 
    </xs:schema> 

    then the output is 

    Error: The 'foo' element is invalid - The value 'a' is invalid according to its
    datatype 'http://www.w3.org/2001/XMLSchema:int' - The string 'a' is not a valid
    Int32 value.; Line: 4

    which has the correct line number I think for the error.




    MVP XML
    Tuesday, December 16, 2008 2:08 PM
  • Thank you very much! All my questions are answered, and I'm happy :-)
    Wednesday, December 17, 2008 7:50 AM