.NET Framework Developer Center > .NET Development Forums > Regular Expressions > XmlReaderSettings ValidationHandler validates expression as False - but it is True. Bug?
Ask a questionAsk a question
 

AnswerXmlReaderSettings ValidationHandler validates expression as False - but it is True. Bug?

  • Wednesday, October 28, 2009 10:29 AMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    I have a simple string that looks like this: "1-1-12 34". I evaluate against this expression:

    ((?<=^|\s)\d+-\d+-.+(?=$|\s)(?<=(?:^|\s)[^\s]{5,16}))
     
    Now using System.Text.RegularExpressions this returns "True". However if I have this in Xsd like this:

    <xs:pattern value="((?&lt;=^|\s)\d+-\d+-.+(?=$|\s)(?&lt;=(?:^|\s)[^\s]{5,16}))"/>
     
    And running it using an XmlReader with a XmlReaderSettings ValidationHandler it fails with "The attribute is invalid - The value '1-1-12 34' is invalid according to its datatype 'TestType' - The Pattern constraint failed"

    The problem is the whitespaces in the string tested. If I specify "1-1-1234" (without whitespace in the '1234') it validates True. Is this a bug in the Xsd validation or am I missing something?

    --

Answers

  • Wednesday, October 28, 2009 8:10 PMAhmad Mageed Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Thanks McElroy. 

    Does anyone know why McElroys pattern validates False in Xml/Xsd while true in RegEx code?
    Not all regex flavors are equal. The pattern doesn't work because XML regex is very limited compared to other languages and does not support lookarounds (the "(?<=" portion of the pattern) or backreferences (the "(\1)" portion). Furthermore, anchoring is implicit, so using ^ and $ isn't required. For more information refer to XML Schema Regular Expressions and Regular Expression Flavor Comparison .

    Without these features your pattern is non-trivial. Generally you would need to find a simpler pattern. Given your dynamic requirements for length with none of the elements having fixed lengths, you would have to come up with all possible combinations and you'll quickly end up with an explosion of unwieldy OR'ed patterns.

    Document my code? Why do you think it's called "code"?
  • Wednesday, October 28, 2009 3:22 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    I'm glad it worked for you in C#. I took a quick look at the XML regular expression documentation on MSDN, or more correctly, what passes for documentation on MSDN, and I found the experience unrewarding. Since you were able to use your original pattern to get a valid response from XML on a string without an embedded space, I compared my pattern to the one you used and the only syntax item that looks new to me is the numbered back reference to serve as an anchor. Whether XML recognizes that I don't know.

    Regrettably, I don't think there's anything further I would be able to contribute since XML is not my game. Perhaps someone more experienced with the idiosyncracies of XML regular expressions can pick up the thread at this point.

    Good luck.

    Ed McElroy
  • Thursday, October 29, 2009 8:06 AMGregory Adam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Werner,

    Any possibility of using a simple pattern (\d+-d+-[\w\d\s]+) combined with minLength=5 and maxLength=16 ?

All Replies

  • Wednesday, October 28, 2009 11:38 AMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Has Code
    Just some more information since I can see this validation error in VS2008 also. If I create a Xsd file and a Xml file the VS-editor will also validate and show the problem:

    Xsd:
    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema targetNamespace="http://www.myorg.org" elementFormDefault="qualified" xmlns="http://www.myorg.org" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="SomeRQ">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Service" maxOccurs="unbounded">
              <xs:complexType>
                <xs:attribute name="Code" type="CodeType" use="required">
                </xs:attribute>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    
      <xs:simpleType name="CodeType">
        <xs:annotation>
          <xs:documentation xml:lang="en">digit-digit-anything.</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">
          <xs:pattern value="((?&lt;=^|\s)\d+-\d+-.+(?=$|\s)(?&lt;=(?:^|\s)[^\s]{5,16}))"/>
        </xs:restriction>
      </xs:simpleType>
    
    </xs:schema>
    
     
    Xml:
    <?xml version="1.0" encoding="utf-8"?>
    <SomeRQ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.myorg.org">
      <Service Code="1-1-12 34" />
    </SomeRQ>
    
    
     
    Then the intellisense of VS2008 will show that the "1-1-12 34" is invalid...why?
    It isnt invalid when running this code:
    // Check code format
    var pattern = @"((?<=^|\s)\d+-\d+-.+(?=$|\s)(?<=(?:^|\s)[^\s]{5,16}))";
    
    Regex exp = new Regex(pattern);
    var succes = exp.Match("1-1-12 34").Success;
    
     
    The above code returns "true"? Which is right?









  • Wednesday, October 28, 2009 12:16 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    As a suggestion which might be a pointer in the right direction:

    Your regular expression specifies a sequence of digits and hyphens with no spaces between 5 and 16 characters long. The sequence "1-1-12 34" provides a positive result because the sequence "1-1-12" satisfies the requirements. The entire sequence does not match the pattern.

    While I know little about XML's use of regular expressions, a cursory glance at some of the documentation gives the impression that in XML the entire value must match the pattern. Your entire test sequence does not meet this requirement either in XML or in normal regular expressions. Since you know more about XML than I, you're better able to pursue this further to see if my hunch is correct.

    This page is what gave me the impression that the entire value must meet the pattern in XML, although it did not specifically state that:

    http://msdn.microsoft.com/en-us/library/ms256481.aspx


    Ed McElroy
  • Wednesday, October 28, 2009 12:22 PMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    As a suggestion which might be a pointer in the right direction:

    Your regular expression specifies a sequence of digits and hyphens with no spaces between 5 and 16 characters long. The sequence "1-1-12 34" provides a positive result because the sequence "1-1-12" satisfies the requirements. The entire sequence does not match the pattern.

    While I know little about XML's use of regular expressions, a cursory glance at some of the documentation gives the impression that in XML the entire value must match the pattern. Your entire test sequence does not meet this requirement either in XML or in normal regular expressions. Since you know more about XML than I, you're better able to pursue this further to see if my hunch is correct.

    This page is what gave me the impression that the entire value must meet the pattern in XML, although it did not specifically state that:

    http://msdn.microsoft.com/en-us/library/ms256481.aspx


    Ed McElroy

    Yes I see. I didnt look at it the right way. But I need to evaluate the entire sequence "1-1-12 34" and not just any part of it, is that possible using System.Text.RegularExpression?

    What Im trying to do is better shown in this thread. Your answer apparantly answered by initial question :) 

  • Wednesday, October 28, 2009 2:17 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Has Code
    There are usually several ways of doing things in regular expressions, if they can be done at all. I took a quick pass at making some changes to your pattern. One of the things I wanted to do was to ignore the beginning and trailing spaces and just focus on the content between the first and last significant character. Taking a quick pass at it, I moved the length check up front and the details check went to the look behind. It correctly rejected a string that was too long. I didn't do exhaustive testing so if you find a sequence where it doesn't work let me know.

    Normally, I would use identifying labels for the expression I want to retrieve but I'm not sure whether you can do that in XML so I didn't do it here. The pattern verifies as correct strings which have a lot of beginning and ending spaces as long as the intervening content is valid. That means when you process the input in some way, you must eliminate the leading and trailing spaces programmatically.

    Ed McElroy

    string TargetStr_1 = "          1-1-12 34 5252 b            ";
    string TargetStr_2 = "          1-1-12 34 525258389230 b            ";
    tring PatternStr = @"(^\s*)((\d(.{3,14})[\S])(\s*$))(?<=(\1)(\d+-\d+-.*?[\S])(\s*$))";
    
    Regex TheRegex = new Regex(PatternStr);
    
    MatchCollection MatchCol = TheRegex.Matches(TargetStr_1);
    foreach (Match m in MatchCol)
    {
        Console.WriteLine(m.Value);
        Console.WriteLine("Length is {0}", m.Value.Length);
    
    }
    MatchCol = TheRegex.Matches(TargetStr_2);
    if (MatchCol.Count > 1)
    {
        foreach (Match m in MatchCol)
        {
            Console.WriteLine(m.Value);
            Console.WriteLine("Length is {0}", m.Value.Length);
        }
    }
    else
        Console.WriteLine("No match on test string {0}", TargetStr_2);
    


  • Wednesday, October 28, 2009 2:57 PMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    @E McElroy:

    Thanks for your reply. Your sample code seem to work and I get a match on "1-1-12 34" for example. However I still get a validation error in Xsd using "<xs:pattern value="(^\s*)((\d(.{3,14})[\S])(\s*$))(?&lt;=(\1)(\d+-\d+-.*?[\S])(\s*$))"/>" - in fact I cant construct a string that validates true at all.

    Any clues why? In any case I can use this as a basis for further investigation. Having a correct pattern in C# and it must be very close to usable in Xsd validation...
  • Wednesday, October 28, 2009 3:22 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    I'm glad it worked for you in C#. I took a quick look at the XML regular expression documentation on MSDN, or more correctly, what passes for documentation on MSDN, and I found the experience unrewarding. Since you were able to use your original pattern to get a valid response from XML on a string without an embedded space, I compared my pattern to the one you used and the only syntax item that looks new to me is the numbered back reference to serve as an anchor. Whether XML recognizes that I don't know.

    Regrettably, I don't think there's anything further I would be able to contribute since XML is not my game. Perhaps someone more experienced with the idiosyncracies of XML regular expressions can pick up the thread at this point.

    Good luck.

    Ed McElroy
  • Wednesday, October 28, 2009 6:18 PMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks McElroy. 

    Does anyone know why McElroys pattern validates False in Xml/Xsd while true in RegEx code?
  • Wednesday, October 28, 2009 8:10 PMAhmad Mageed Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Thanks McElroy. 

    Does anyone know why McElroys pattern validates False in Xml/Xsd while true in RegEx code?
    Not all regex flavors are equal. The pattern doesn't work because XML regex is very limited compared to other languages and does not support lookarounds (the "(?<=" portion of the pattern) or backreferences (the "(\1)" portion). Furthermore, anchoring is implicit, so using ^ and $ isn't required. For more information refer to XML Schema Regular Expressions and Regular Expression Flavor Comparison .

    Without these features your pattern is non-trivial. Generally you would need to find a simpler pattern. Given your dynamic requirements for length with none of the elements having fixed lengths, you would have to come up with all possible combinations and you'll quickly end up with an explosion of unwieldy OR'ed patterns.

    Document my code? Why do you think it's called "code"?
  • Thursday, October 29, 2009 7:52 AMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks McElroy. 

    Does anyone know why McElroys pattern validates False in Xml/Xsd while true in RegEx code?
    Not all regex flavors are equal. The pattern doesn't work because XML regex is very limited compared to other languages and does not support lookarounds (the "(?<=" portion of the pattern) or backreferences (the "(\1)" portion). Furthermore, anchoring is implicit, so using ^ and $ isn't required. For more information refer to XML Schema Regular Expressions and Regular Expression Flavor Comparison .

    Without these features your pattern is non-trivial. Generally you would need to find a simpler pattern. Given your dynamic requirements for length with none of the elements having fixed lengths, you would have to come up with all possible combinations and you'll quickly end up with an explosion of unwieldy OR'ed patterns.

    Document my code? Why do you think it's called "code"?

    Ok that makes sense (looking at my troubles). however it would also mean that validating for example "\d+-\d+-+." but at the same time making sure that the whole string isn't more than 16 chars isn't possible using XML regex?

    Of course one could use a combination of restrictions to accomplish this - but...

     
  • Thursday, October 29, 2009 8:06 AMGregory Adam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Werner,

    Any possibility of using a simple pattern (\d+-d+-[\w\d\s]+) combined with minLength=5 and maxLength=16 ?
  • Thursday, October 29, 2009 9:01 AMWerner Clausen Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Werner,

    Any possibility of using a simple pattern (\d+-d+-[\w\d\s]+) combined with minLength=5 and maxLength=16 ?

    Yes that is also my conclusion. Thanks.
  • Monday, November 02, 2009 7:34 PMLeonid GanelineMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Werner,

    If the XML code under your controle, you can try to use the CDATA section to put the RegEx pattern through the XML transformations without changing.
    See http://msdn.microsoft.com/en-us/library/ms256076.aspx
    and http://msdn.microsoft.com/en-us/library/system.xml.linq.xcdata_members.aspx (if you use Linq for Xml).
    Leonid Ganeline [BizTalk MVP] My BizTalk blog