XML Shred Issues RRS feed

  • Question

  •  We use BizTalk to call a stored procedure and insert records into a table. One of the parameters is XML, so we are shredding it into a table varaible and then loading from the table variable. The seems to be faster than reading the XML in the insert. We tried OpenXML and XQuery and found that OpenXML was much faster. Here's the problem: When BizTalk has about 60000 messages queued up and then starts firing them to SQL, we sometimes get errors about a primary key violation. There are no duplicates being passed in in the XML variable. Are there any memory issues with OpenXML? Anyone have any ideas?
    Edit: We are 2008 SP2 on a Windows Server 2003

    • Moved by Tom PhillipsModerator Wednesday, January 19, 2011 7:14 PM TSQL question (From:SQL Server Database Engine)
    Wednesday, January 19, 2011 5:03 PM


All replies

  • There are memory issues with OPENXML, but more around it being greedy:

    Stop Using OPENXML (Please...)

    You should know it's possible to write bad XQuery as well as bad SQL.  Can you post a small piece of sample XML, your OPENXML solution and comparable XQuery solution?  It may be possible to optimize the XQuery solution.

    Also have a look through these great articles.

    XML Indexes in SQL Server 2005

    Performance Optimizations for the XML Data Type in SQL Server 2005


    • Proposed as answer by Naomi NModerator Thursday, January 20, 2011 6:50 PM
    • Marked as answer by KJian_ Tuesday, January 25, 2011 9:57 AM
    Thursday, January 20, 2011 10:10 AM
  • I also found this article relating to SQL 2000 but it still applies:

    "Avoid OPENXML over Large XML Documents
    Be aware that there are limitations to the amount of memory that is available to the OPENXML construct over an XML document operation. This operation builds a Document Object Model (DOM) in the SQL buffer space that can be much larger than the original document size. Also, this operation is limited to one eighth of the buffer space, and large XML documents may consume this memory fairly quickly and cause an out-of-memory error on the server. Do not create large systems based on this functionality without conducting significant load testing. You might also want to use the XML bulk load option if possible.

    Avoid Large Numbers of Concurrent OPENXML Statements over XML Documents
    You also have to consider the issue with OPENXML when you use OPENXML to batch inserts. This is a fairly common operation because it is an effective way to issue a group of inserts with one statement. Issuing a group of inserts reduces the overhead of multiple insert statements and multiple round trips. However, be aware that this approach may not be very scalable because of the aforementioned memory limitations.


    Thursday, January 20, 2011 10:16 AM
  • The first link doesn't seem to work.

    For every expert, there is an equal and opposite expert. - Becker's Law

    My blog

    Thursday, June 14, 2012 6:54 PM
  • Yes think that article has been removed.  Not sure if this is because it's been aged out of the cache ;) or the author thinks the point is no longer valid.  What's still valid is that OPENXML can (although not always) take up to 1/8th of your server memory when you use it, so if you bear that in mind, what does one query going faster mean?  Also, OPENXML can only ever work against one XML so encourages row-based processing.  You simply can't apply it against a table as you can with the xml datatype methods ( eg nodes ).

    There are a number of pros and cons to both approaches which are covered elsewhere.

    The answer as always is it depends - try it and see with your data.  I personally prefer the XML datatype and methods as I find them far more flexible and I don't have to worry about the guys from the server team coming to see me about some of the queries I've been running and about what it means to be a good neighbour on a SQL Server.


    Thursday, June 14, 2012 10:57 PM
  • I've leveraged OpenXML for very large, enterprise systems (Billions of rows) over the past 10 years and have never experienced significant performance issues. As a craftsman, it is your responsibility to know how and when to utilize your tool, and how to use them all effectively.

    I really dislike comments like these that dissuade specific technologies with no real explanation.

    The fact that OpenXML loads the entire DOM can be a positive factor, and again, I have several real-tme transaction processing systems in place (each processing 500k+ rows per day) that utilize OpenXML, and there has never been any issues with performance.

    By all means use whatever fits your needs, and OpenXML has it's own set of drawbacks (whitespace handling, no support for xpath 2.0), but seriously some of the banter on here is a bit sensational.

    • Proposed as answer by Jdubya Tuesday, February 4, 2014 3:28 PM
    • Edited by Jdubya Tuesday, February 4, 2014 3:29 PM
    Tuesday, February 4, 2014 3:28 PM
  • QUOTE: Also, OPENXML can only ever work against one XML so encourages row-based processing. You simply can't apply it against a table as you can with the xml datatype methods ( eg nodes ).

    This is a completely false statement, why post about topics that you don't understand, I mean your statement doesn't even make any sense. 

    INSERT INTO #RandomTempTable
    CASE FeePaying
    WHEN 'True' THEN 1
    ELSE 0 
    Cast (Amount AS Float),
    OPENXML(@iDoc, '//fn:FundIncome', 3) 
    FeePaying varchar(5) 'fn:Fee_Paying',
    Amount varchar(50) 'fn:Amount',
    EffectiveDate varchar(50) 'fn:Effective_Date',
    Pool varchar(20) 'fn:Pool',
    AllocationAccount varchar(50) 'fn:Allocation_Account',
    TaxLine varchar(20) 'fn:Tax_x0020_Line',
    AllocationScheme varchar(50) 'fn:Allocation_Scheme'

    -- Free Preprocessor memory
    EXEC sp_xml_removedocument @iDoc

    I wish I had caught this comment when it was originally posted. 

    Wednesday, February 21, 2018 1:44 PM