locked
Problem validating Xml against Xsd with regex as string pattern...

    Question

  • Hi,

    I'm using the XmlReader and XmlReaderSettings in order to validate some Xml against an Xsd file. However one of those Xsd string patterns define a regular expression that contains the character "<". This is not valid in an Xsd file so all I could think of was to save it HtmlEncoded in the Xsd file. Of course this doesnt work - but is it impossible somehow to have such regex within an Xsd file?

    Thanks,

    Monday, June 30, 2008 7:32 AM

Answers

  • The less than and greater than symbol are reserved characters in xml and therefore should be replaced with their entity counterparts for use in text/values.   That means that '<' becomes &lt; and likewise '>' &gt;  If I'm not mistaken, when it comes time to evaluate the regex, those entities should be converted automatically to their character representations.  
    Monday, June 30, 2008 12:38 PM
  • Hi Werner,

    Sorry! I pasted the wrong link in my previous post it should have been: XML Schema Regular Expressions Reference Chart

    As you know there are many flavours of Regular Expresson syntax.  The one used by the Regex class in the .net Library is one, others include PERL, Java, egrap, etc...

    My assertion, which could be wrong, is that the flavour used in an .XSD is NOT the same flavour of regex that's used by the .Net Regex class.

    The sample you cite (?<=^|\s)  appears to be positive lookbehind.  This does not appear to be a feature supported by the flavour of Regex used by in .XSD.

    Sorry for the confusion,

    John


    Monday, June 30, 2008 1:01 PM

All replies

  • Hi,

    I'm not familiar with XmlReaderSettings, but could you simply express the angle brackets using their hex codes?

    E.g. the two patterns below produce the same output :

        string input = "apple<banana>cherry";
        string pattern1 = @"<.*>";
        string pattern2 = @"\x3C.*\x3E";
        Console.WriteLine(Regex.Match(input, pattern1).Value);
        Console.WriteLine(Regex.Match(input, pattern2).Value);

    output:
    <banana>
    <banana>

    Good luck,

    John
    Monday, June 30, 2008 8:27 AM
  • Hi,

    Well no - while it seems valid for text values, it is not valid when the "<" is part of the RegEx syntax, like this:
    (?\x3C=^|\s) 

    But thanks for the suggestion.
    Monday, June 30, 2008 9:40 AM
  • Well I must admit I didn't think of that but...   The regular expression syntax in an .XSD could be that defined by the W3C (or whoever defined the XML Schema) rather than the 'normal' .Net syntax.  Looking at the "XML Schema Regular Expressions Reference Chart" I don't see angle brackets being used.

    Good luck,

    john
    Monday, June 30, 2008 11:39 AM
  • The less than and greater than symbol are reserved characters in xml and therefore should be replaced with their entity counterparts for use in text/values.   That means that '<' becomes &lt; and likewise '>' &gt;  If I'm not mistaken, when it comes time to evaluate the regex, those entities should be converted automatically to their character representations.  
    Monday, June 30, 2008 12:38 PM
  • Hi Werner,

    Sorry! I pasted the wrong link in my previous post it should have been: XML Schema Regular Expressions Reference Chart

    As you know there are many flavours of Regular Expresson syntax.  The one used by the Regex class in the .net Library is one, others include PERL, Java, egrap, etc...

    My assertion, which could be wrong, is that the flavour used in an .XSD is NOT the same flavour of regex that's used by the .Net Regex class.

    The sample you cite (?<=^|\s)  appears to be positive lookbehind.  This does not appear to be a feature supported by the flavour of Regex used by in .XSD.

    Sorry for the confusion,

    John


    Monday, June 30, 2008 1:01 PM
  • Thanks both. While is seems that look arounds arent supported (looking at the link from rtizan) they are...if I enter the encoded (&lt;) value for "<" it validates just fine.

    So I'm a very happy regex noob :)

    --


    Tuesday, July 01, 2008 10:38 AM