none
whitespace handling in validation of xsd token RRS feed

  • Question

  • Hello, I get a validation error for a token containing only spaces. I would expect it to pass due to the validator collapsing it to an empty string (according to the whithespace handling facet). In fact other validators (e.g. XmlSpy) are happy with the same xml.

    Here's the code to reproduce the behavior (setting IgnoreWhitespace true on XmlReaderSetting make the xml pass validation, but I can't understand why this setting affects validation).

            static string Validate(string xsd, string xml, bool ignoreWhitespace)
            {
                var xmlSchemaSet = new XmlSchemaSet();
                using (var textReader = new XmlTextReader(new StringReader(xsd)))
                {
                    xmlSchemaSet.Add(XmlSchema.Read(textReader, null));
                }
                var settings = new XmlReaderSettings();
                settings.ValidationType = ValidationType.Schema;
                settings.Schemas = xmlSchemaSet;
                settings.IgnoreWhitespace = ignoreWhitespace; // default is false
                using (var reader = XmlReader.Create(new StringReader(xml), settings))
                {
                    try
                    {
                        while (reader.Read()) { }
                        return string.Empty;
                    }
                    catch (XmlSchemaException e)
                    {
                        return e.Message;
                    }
                }
            }
            
            [TestMethod]
            public void TokenWithSpaces()
            {
                var xsd = @"
        <xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' 
            elementFormDefault='qualified' attributeFormDefault='unqualified'>
            <xs:element name = 'e' type='xs:token' />
        </xs:schema >";
                
                var test1 = Validate(xsd, "<e> </e>", ignoreWhitespace: false);
                var test2 = Validate(xsd, "<e> </e>", ignoreWhitespace: true);
    
                var errorMessage = @"The 'e' element is invalid - The value ' ' is invalid according to its datatype 'http://www.w3.org/2001/XMLSchema:token' - line-feed (#xA) or tab (#x9) characters, leading or trailing spaces and sequences of one or more spaces (#x20) are not allowed in 'xs:token'.";
    
                Assert.AreEqual(errorMessage, test1);
                Assert.AreEqual(string.Empty, test2);
            }
    

    Monday, July 27, 2015 9:25 AM

Answers

  • Hi Giacomo,

    >> I expected this xml to be considered valid (as it happens with other validators) because values of type xsd:token should be collapsed, and this implies that leading and trailing spaces should be ignored.

    White Space documentation tells we have xml:space attribute to indicate how to deal with white space in element and other location of XML documentation, seems you could use this attribute to achieve your goal. Please take a look.

    Regards,


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, July 29, 2015 2:43 AM
    Moderator

All replies

  • Hello,

    XmlReaderSettings.IgnoreWhitespace Property indicates whether to ignore insignificant white space. This property setting does not affect white space between markup in a mixed content mode, or white space that occurs within the scope of an xml:space='preserve' attribute. Is that mean it affects white space in element? For more info, see https://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.ignorewhitespace(v=vs.110).aspx

    An easy way to work around is to deal with empty string separately.

    Regards,


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Tuesday, July 28, 2015 6:07 AM
    Moderator
  • OK, but I'm still a bit confused. I thought white space in "<e> </e>" was significant for the XML processor. Anyway my core concern is with validation: I expected this xml to be considered valid (as it happens with other validators) because values of type xsd:token should be collapsed, and this implies that leading and trailing spaces should be ignored.

    Regards,

    Giacomo

    Tuesday, July 28, 2015 8:10 AM
  • Hi Giacomo,

    >> I expected this xml to be considered valid (as it happens with other validators) because values of type xsd:token should be collapsed, and this implies that leading and trailing spaces should be ignored.

    White Space documentation tells we have xml:space attribute to indicate how to deal with white space in element and other location of XML documentation, seems you could use this attribute to achieve your goal. Please take a look.

    Regards,


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, July 29, 2015 2:43 AM
    Moderator