locked
.NET 4.0 XML validation performance RRS feed

  • Question

  • Trying to validate an XML document (using System.xm.XmlReader) against its corresponding schema, performance can be really slow, depending on the value of certain elements. Below you will find the Vb.Net code as well as the XSD schema and a sample XML file.

    XML validation takes approx. 10 minutes on a 64-bit windows 7 workstation with an i-5 2400 CPU @ 3.1GHz! The performance bottleneck in this case appears to be the validation of the value ‘This is a large Transaction ID!’ of the XML element /Document/SctiesSttlmTxInstr/TxId.

    <Document xmlns="urn:iso:std:iso:20022:tech:xsd:sese.023.001.03">
      <SctiesSttlmTxInstr>
        <TxId>This is a large Transaction ID!</TxId>
        <SttlmTpAndAddtlParams>
          <SctiesMvmntTp>RECE</SctiesMvmntTp>
          <Pmt>FREE</Pmt>
        </SttlmTpAndAddtlParams>
        <TradDtls>
          <TradDt>
            <Dt>
              <Dt>2015-01-01</Dt>
            </Dt>
          </TradDt>
          <SttlmDt>
            <Dt>
              <Dt>2015-01-31</Dt>
            </Dt>
          </SttlmDt>
        </TradDtls>
        <FinInstrmId>
          <ISIN>GR0133001140</ISIN>
        </FinInstrmId>
        <QtyAndAcctDtls>
          <SttlmQty>
            <Qty>
              <FaceAmt>1000000.00</FaceAmt>
            </Qty>
          </SttlmQty>
          <SfkpgAcct>
            <Id>BOGS10080001015137</Id>
          </SfkpgAcct>
        </QtyAndAcctDtls>
        <SttlmParams>
          <Prty>
            <Nmrc>0003</Nmrc>
          </Prty>
          <SctiesTxTp>
            <Cd>COLI</Cd>
          </SctiesTxTp>
          <SttlmTxCond>
            <Cd>NOMC</Cd>
          </SttlmTxCond>
          <PrtlSttlmInd>NPAR</PrtlSttlmInd>
          <ModCxlAllwd>
            <Ind>true</Ind>
          </ModCxlAllwd>
          <SctiesSubBalTp>
            <Id>9100</Id>
            <Issr>T2S</Issr>
            <SchmeNm>RT</SchmeNm>
          </SctiesSubBalTp>
        </SttlmParams>
        <DlvrgSttlmPties>
          <Dpstry>
            <Id>
              <AnyBIC>BNGRGRAASSS</AnyBIC>
            </Id>
          </Dpstry>
          <Pty1>
            <Id>
              <AnyBIC>ETHNGRGRAAX</AnyBIC>
            </Id>
            <SfkpgAcct>
              <Id>BOGS100800010160</Id>
            </SfkpgAcct>
          </Pty1>
        </DlvrgSttlmPties>
      </SctiesSttlmTxInstr>
    </Document>
    http://www.bundesbank.de/4zb/download/v1.2.1/securitiessettlementtransactioninstruction/sese.023.001.03_T2S.xsd
    Imports System.IO
    Imports System.Xml
    Imports System.Xml.Schema
    
    Public Class XmlValidator
    
        Private _xmlPath As String
        Private _xsdPath As String
    
        Private _validationTime As TimeSpan
        ReadOnly Property ValidationTime As TimeSpan
            Get
                Return Me._validationTime
            End Get
        End Property
    
        Private _vea As List(Of ValidationEventArgs)
        ReadOnly Property ValidationEvents As List(Of ValidationEventArgs)
            Get
                Return Me._vea
            End Get
        End Property
    
        Public Sub New(ByVal xmlPath As String, ByVal xsdPath As String)
            Me._xmlPath = xmlPath
            Me._xsdPath = xsdPath
            Me._vea = New List(Of ValidationEventArgs)
        End Sub
    
        Public Sub validate()
    
            Dim sw As New Stopwatch()
            Me._vea.Clear()
    
            Using stream As FileStream = File.OpenRead(Me._xsdPath)
    
                Dim settings As New XmlReaderSettings
                Dim schema As XmlSchema = XmlSchema.Read(stream, AddressOf xsdValidationEventHandler)
                settings.ValidationType = ValidationType.Schema
                settings.Schemas.Add(schema)
                AddHandler settings.ValidationEventHandler, AddressOf xmlValidationEventHandler
    
                sw.Start()
                Using reader As XmlReader = XmlReader.Create(Me._xmlPath, settings)
                    While reader.Read()
                    End While
                End Using
                sw.Stop()
    
            End Using
    
            Me._validationTime = sw.Elapsed
    
        End Sub
    
        Private Sub xsdValidationEventHandler(ByVal sender As Object, ByVal args As ValidationEventArgs)
            If args.Severity = XmlSeverityType.Error Then
                Throw New System.Exception("Invalid XML Schema file")
            End If
        End Sub
    
        Private Sub xmlValidationEventHandler(ByVal sender As Object, ByVal args As ValidationEventArgs)
            Me._vea.Add(args)
        End Sub
    
    End Class
    

    Thursday, May 15, 2014 1:31 PM

Answers

  • The XML validation itself isn't a problem, the problem is a regular expression that's used in that xsd file:

    ([0-9a-zA-Z\-\?:\(\)\.,'\+ ](?>[0-9a-zA-Z\-\?:\(\)\.,'\+ ]*([0-9a-zA-Z\-\?:\(\)\.,'\+ ])?)*)

    You can reproduce this with the following simple program (sorry, C#, I don't know much VB):

    Regex.IsMatch(
        "This is a large Transaction ID!", 
        @"^(([0-9a-zA-Z\-\?:\(\)\.,'\+ ]([0-9a-zA-Z\-\?:\(\)\.,'\+ ]*([0-9a-zA-Z\-\?:\(\)\.,'\+ ])?)*))$");
    
    I think you can avoid the problem by disabling backtracking in the regular expression but that means that you'll need to modify the xsd file.

    Thursday, May 15, 2014 2:51 PM

All replies

  • Is a virus checker or firewall causing the delay?  When Task manager while the delay is occuring ans see the CPU usage of the XML file.  If you try to open the XML file with Notepad do you get a similar delay?

    jdweng

    Thursday, May 15, 2014 1:46 PM
  • The XML validation itself isn't a problem, the problem is a regular expression that's used in that xsd file:

    ([0-9a-zA-Z\-\?:\(\)\.,'\+ ](?>[0-9a-zA-Z\-\?:\(\)\.,'\+ ]*([0-9a-zA-Z\-\?:\(\)\.,'\+ ])?)*)

    You can reproduce this with the following simple program (sorry, C#, I don't know much VB):

    Regex.IsMatch(
        "This is a large Transaction ID!", 
        @"^(([0-9a-zA-Z\-\?:\(\)\.,'\+ ]([0-9a-zA-Z\-\?:\(\)\.,'\+ ]*([0-9a-zA-Z\-\?:\(\)\.,'\+ ])?)*))$");
    
    I think you can avoid the problem by disabling backtracking in the regular expression but that means that you'll need to modify the xsd file.

    Thursday, May 15, 2014 2:51 PM