locked
OPENXML Query Help .... RRS feed

  • Question

  • Folks:

    I need help with writing a OPENXML query. I have the below mentioned query which shreds the XML data from column 'ResultXML' from table 'XmlDataTable'. Becuase of the size of the XML data this query takes a long time. I read that using OPENXML it would be faster and wanted any help on writing the OPENXML query.

    SELECT 
    				nodeEntry.value('(./Params/PName)[1]','VARCHAR(250)') AS PName,
    				nodeEntry.value('(./Params/PGrpId)[1]','INT') AS PGrpId,
    				nodeEntry.value('(./Params/AsOf)[1]','DATETIME') AS AsOf,
    				nodeEntry.value('(./SummStat/TrackError)[1]','FLOAT') AS TrackError,
    				nodeEntry.value('(./SummStat/S_V_95)[1]','FLOAT') AS V_95,
    				nodeEntry.value('(./SummStat/S_V_99)[1]','FLOAT') AS V_99,
    				nodeEntry.value('(./SummStat/PValue)[1]','FLOAT') AS Value,
    				D2.TracingDetails,
    				D2.SysDate,
    				D2.Name
    FROM			XmlDataTable D1
    OUTER APPLY		D1.ResultXML.nodes('./SinglePVar') nodeslist(nodeEntry)
    LEFT OUTER JOIN tblData1 D2
    ON				D1.PartyName = D2.PartyName
    WHERE			D1.RunId = 542016
    AND				D1.SysDate = '08/28/2011'
    


    Here is the XML data in ResultXML column (1 record):

    <SinglePVar>
     <Params>
     <PName>BIS_Model</PName>
     <PGrpId>224</PGrpId>
     <AsOf>2003-12-31T00:00:00</AsOf>
     <MName>Cash</MName>
     <StdevPList>Window</StdevPList>
     </Params>
     <SummStat>
     <UnTrackError>1.194670829272935e-001</UnTrackError>
     <TrackError>4.138461149242335e-001</TrackError>
     <Un_V_95>1.965233514153978e-001</Un_V_95>
     <Un_V_99>3.077472056207081e-001</Un_V_99>
     <S_V_95>6.807768590503640e-001</S_V_95>
     <S_V_99>1.066067592044826e+000</S_V_99>
     <SFact>1.200000000000000e+001</SFact>
     <PValue>4.098631071641542e+008</PValue>
     <ConMBAgent>6.541920628018216e-001</ConMBAgent>
     </SummStat>
     <FContribTV>
     <FContribTVEntry>
     <Fact>Agency</Fact>
     <VContrib>-6.551961080087127e-003</VContrib>
     <Multiplier>5.000000000000000e-001</Multiplier>
     <MVTE>3.243186331733051e+001</MVTE>
     </FContribTVEntry>
     </FContribTV>
    </SinglePVar>
    

     

    Thanks !






    Wednesday, August 31, 2011 8:46 PM

Answers

  • A few quotes from Books Online which indicate OPENXML can indeed be "dangerous":

    • "A parsed document is stored in the internal cache of SQL Server. The MSXML parser uses one-eighth the total memory available for SQL Server. To avoid running out of memory, run sp_xml_removedocument to free up the memory.

    • "The XML parser invoked by sp_xml_preparedocument can parse internal DTDs and entity declarations. Because maliciously constructed DTDs and entity declarations can be used to perform a denial of service attack, we strongly recommend that users not directly pass XML documents from untrusted sources to sp_xml_preparedocument."

    Without wanting to exaggerate the danger, simply put, neither of these are risks when using the XML datatype and its methods.

    Regarding OPENXML operating on multiple rows, I don't think this is possible without a cursor.  Therefore if you are recommending OPENXML you are proposing a cursor based, row-by-row approach instead of a set-based approach.  Jeff, can you clarify your comment "you can shred multiuple rows using OPENXML and it does not only work against scalars"?  Do you have an example of this?  If you mean using a cursor then this is OPENXML operating against scalars, as Dmitri indicated.

    In this example, the XML is stored in a table so is a great candidate for XML Indexing and typing ( with XML SCHEMA COLLECTION ).  Have a look through these great articles before giving up on the nodes method:

    Performance Optimizations for the XML Data Type in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345118.aspx

    XML Indexes in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx

     

    Have you really tried Hunchback's query and got poor performance?  I created some test data of 200,000 similar records and couldn't get it to run in more than 1 second.  A few other questions, how large are your tables and how large is the XML in the table?  Do the tables themselves have appropriate covering indexes for this query?  eg something like:

    CREATE INDEX idx1 ON XmlDataTable ( PartyName, RunId, sysDate ) INCLUDE ( ResultXML )
    CREATE INDEX idx2 ON tblData1 ( PartyName ) INCLUDE ( TracingDetails, SysDate, Name )
    
    

    If you give us a bit more information, we might be able to get a result for you, either with nodes or even OPENXML, because as so often with SQL Server, it depends!

     

    • Proposed as answer by Naomi N Friday, September 2, 2011 2:49 AM
    • Marked as answer by KJian_ Thursday, September 8, 2011 9:02 AM
    Friday, September 2, 2011 1:48 AM

All replies

  • Speaking of OpenXML - this is the dangerous route. Even if it's faster than XQUERY, it uses 1/8th of sql server memory. Besides that I'm not sure it's even possible to shred multiple rows in the single query with OPENXML. It rather works against "scalars". Are you sure that your query is slow because of XML? Can you post the execution plan here? Also did you try to create xml indexes to speed up the shredding process?

    Thank you!

    My blog: http://aboutsqlserver.com

    Thursday, September 1, 2011 12:39 AM
  • In addition, you need to use CROSS APPLY (not OUTER APPLY) - since you're using filters, your outer apply will be converted to cross apply anyway and also if you shred your XML more you'll get much better performance. I have a query saved that demonstrates the technique - first query performs better than second:

     

    DECLARE @x XML
    SET @x = '<root>
     <BookingPassengers>
     <BookingPassenger>
     <Name>R</Name>
     <PassengerID>12</PassengerID>
     <PassengerFees>
     <SSRCode>PBAB</SSRCode>
     <LegKey>
     <flightnumber>28</flightnumber>
     </LegKey>
     </PassengerFees>
     <PassengerFees>
     <SSRCode>IMNX</SSRCode>
     <LegKey>
     <flightnumber>28</flightnumber>
     </LegKey>
     </PassengerFees>
     </BookingPassenger>
     <BookingPassenger>
     <Name>K</Name>
     <PassengerID>32</PassengerID>
     <PassengerFees>
     <SSRCode>VGMX</SSRCode>
     <LegKey>
     <flightnumber>28</flightnumber>
     </LegKey>
     </PassengerFees>
     </BookingPassenger>
     </BookingPassengers>
    </root>'
    
    SELECT 
    	p.c.value('(PassengerID/text())[1]', 'INT') AS PassengerID,
    	f.c.value('(SSRCode/text())[1]', 'VARCHAR(10)') AS SSRCode,
    	l.c.value('(flightnumber/text())[1]', 'VARCHAR(10)') AS flightnumber
    FROM @x.nodes( '/root/BookingPassengers/BookingPassenger' ) p(c)
    	CROSS APPLY p.c.nodes('PassengerFees') f(c)
    		CROSS APPLY f.c.nodes('LegKey') l(c)
    
    -- Above is neater and more performant than this
    SELECT 
    	p.c.value('(../../PassengerID/text())[1]', 'INT') AS PassengerID,
    	p.c.value('(../SSRCode/text())[1]', 'VARCHAR(10)') AS SSRCode,
    	p.c.value('(flightnumber/text())[1]', 'VARCHAR(10)') AS flightnumber
    FROM @x.nodes( '/root/BookingPassengers/BookingPassenger/PassengerFees/LegKey' ) p(c)
    
    

    I myself also recently found the same situation - see this thread and referenced thread. So, I advise to adjust your query to use the same pattern as the first query.
    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    Thursday, September 1, 2011 12:48 AM
  • Could you try this suggestion before jumping into using OPENXML?

     

    with rs as (
    SELECT
    				D1.ResultXML,
    				D2.TracingDetails,
    				D2.SysDate,
    				D2.Name
    FROM			XmlDataTable D1
    LEFT OUTER JOIN tblData1 D2
    ON				D1.PartyName = D2.PartyName
    WHERE			D1.RunId = 542016
    AND				D1.SysDate = '08/28/2011'
    )
    select
    	P.x.value('(PName/text())[1]', 'varchar(250)') as PName,
    	P.x.value('(PGrpId/text())[1]', 'int') as PGrpId,
    	P.x.value('(AsOf/text())[1]', 'datetime') as AsOf,
    	S.x.value('(TrackError/text())[1]', 'float') as TrackError,
    	S.x.value('(Un_V_95/text())[1]', 'float') as Un_V_95,
    	S.x.value('(S_V_99/text())[1]', 'float') as S_V_99,
    	S.x.value('(PValue/text())[1]', 'float') as PValue,
    	R.TracingDetails,
    	R.SysDate,
    	R.Name
    from
    	rs as R
    	outer apply
    	R.ResultXML.nodes('SinglePVar[1]') as T(x)
    	outer apply
    	T.x.nodes('Params[1]') as P(x)
    	outer apply
    	T.x.nodes('SummStat[1]') as S(x);
    go
    

    If the xml column is un-typed, then using text() node test, instead just the name of the element, could improve the performance.

     

     


    AMB

    Some guidelines for posting questions...

    Thursday, September 1, 2011 1:22 AM
  • Speaking of OpenXML - this is the dangerous route. Even if it's faster than XQUERY, it uses 1/8th of sql server memory. Besides that I'm not sure it's even possible to shred multiple rows in the single query with OPENXML. It rather works against "scalars". Are you sure that your query is slow because of XML? Can you post the execution plan here? Also did you try to create xml indexes to speed up the shredding process?

    Thank you!

    My blog: http://aboutsqlserver.com


    Firstly, your claims about using OPENXML is a "dangerous route" are unfounded and I challenge you to produce evidence that using such technology causes problems within SQL Server.  So it uses 1/8th of the SQL Server memory.  So what!  All you need to remember to do is execute sp_remove as soon as you've finished shredding your XML.  BTW, XQUERY uses quite a large amount of memory when trying to shred large XML data and this is why it's recommended that you use OPENXML INSTEAD of XQUERY for shredding large documents.

    Secondly, you can shred multiuple rows using OPENXML and it does not only work against scalars.

    Thirdly, you can't create XML indexes against dynamic XML being shredded with XQUERY or OPENXML.


    Jeff Wharton
    MSysDev (C.Sturt), MDbDsgnMgt (C.Sturt), MCT, MCPD, MCITP, MCDBA
    Blog: Mr. Wharty's Ramblings
    MC ID: Microsoft Transcript

    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Thursday, September 1, 2011 2:19 AM
  • Tried with the above given query as suggested but performance is a issue. If anybody can help with OPENXML query that would be great. I know that OPENXML and memory is a debatable issue but I'm fine with OPENXML it because once all the data from XML is shredded that is it.

    I have written this OPENXML query and it works to shred the XML data but I don't know how to get the data from the other columns in the 'select' list which has joins (D2.TracingDetails,D2.SysDate,D2.Name) using this query.

     

    DECLARE @hdoc int
    DECLARE @inputXML XML
    
    SET @inputXML = (SELECT CAST(ResultXML AS VARCHAR(MAX)) from tblXmlData where RunId = 526014)
    
    exec sp_xml_preparedocument 
    @hdoc OUTPUT,
    @inputXml
    
    
    SELECT 		PName,
    		PGrpId,
    		AsOf,
    		TrackError,
    		S_V_95,
    		S_V_99,
    		PValue
    FROM OPENXML(@hdoc, './SinglePVar/Params',2)
    WITH 	( PName VARCHAR(100),
    	 PGrpId INT,
    	 AsOf DATETIME)
    CROSS APPLY OPENXML(@hdoc,'./SinglePVar/SummStat',2)
    WITH ( TrackError FLOAT,
     	S_V_95 FLOAT,
    	S_V_99 FLOAT,
    	PValue FLOAT)
    
    
    exec sp_xml_removedocument @hdoc
    

    Thursday, September 1, 2011 7:27 PM
  • A few quotes from Books Online which indicate OPENXML can indeed be "dangerous":

    • "A parsed document is stored in the internal cache of SQL Server. The MSXML parser uses one-eighth the total memory available for SQL Server. To avoid running out of memory, run sp_xml_removedocument to free up the memory.

    • "The XML parser invoked by sp_xml_preparedocument can parse internal DTDs and entity declarations. Because maliciously constructed DTDs and entity declarations can be used to perform a denial of service attack, we strongly recommend that users not directly pass XML documents from untrusted sources to sp_xml_preparedocument."

    Without wanting to exaggerate the danger, simply put, neither of these are risks when using the XML datatype and its methods.

    Regarding OPENXML operating on multiple rows, I don't think this is possible without a cursor.  Therefore if you are recommending OPENXML you are proposing a cursor based, row-by-row approach instead of a set-based approach.  Jeff, can you clarify your comment "you can shred multiuple rows using OPENXML and it does not only work against scalars"?  Do you have an example of this?  If you mean using a cursor then this is OPENXML operating against scalars, as Dmitri indicated.

    In this example, the XML is stored in a table so is a great candidate for XML Indexing and typing ( with XML SCHEMA COLLECTION ).  Have a look through these great articles before giving up on the nodes method:

    Performance Optimizations for the XML Data Type in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345118.aspx

    XML Indexes in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx

     

    Have you really tried Hunchback's query and got poor performance?  I created some test data of 200,000 similar records and couldn't get it to run in more than 1 second.  A few other questions, how large are your tables and how large is the XML in the table?  Do the tables themselves have appropriate covering indexes for this query?  eg something like:

    CREATE INDEX idx1 ON XmlDataTable ( PartyName, RunId, sysDate ) INCLUDE ( ResultXML )
    CREATE INDEX idx2 ON tblData1 ( PartyName ) INCLUDE ( TracingDetails, SysDate, Name )
    
    

    If you give us a bit more information, we might be able to get a result for you, either with nodes or even OPENXML, because as so often with SQL Server, it depends!

     

    • Proposed as answer by Naomi N Friday, September 2, 2011 2:49 AM
    • Marked as answer by KJian_ Thursday, September 8, 2011 9:02 AM
    Friday, September 2, 2011 1:48 AM
  • Proposed XML indexes:

     

    CREATE PRIMARY XML INDEX xidxResultXML ON XmlDataTable( ResultXML )
    GO
    CREATE XML INDEX pthResultXML  ON XmlDataTable(ResultXML )
    USING XML INDEX xidxResultXML FOR PATH
    

    And simple XML SCHEMA COLLECTION, created in Visual Studio 2008 by just placing one of the XML samples in an xml file, opening it in Visual Studio and using the XML, 'Create Schema' option:

     

     

    IF NOT EXISTS ( SELECT * FROM sys.xml_schema_collections WHERE name= N'xsc_singlePVar' )
    CREATE XML SCHEMA COLLECTION xsc_singlePVar AS '<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    	<xs:element name="SinglePVar">
    		<xs:complexType>
    			<xs:sequence>
    				<xs:element name="Params">
    					<xs:complexType>
    						<xs:sequence>
    							<xs:element name="PName" type="xs:string" />
    							<xs:element name="PGrpId" type="xs:unsignedByte" />
    							<xs:element name="AsOf" type="xs:dateTime" />
    							<xs:element name="MName" type="xs:string" />
    							<xs:element name="StdevPList" type="xs:string" />
    						</xs:sequence>
    					</xs:complexType>
    				</xs:element>
    				<xs:element name="SummStat">
    					<xs:complexType>
    						<xs:sequence>
    							<xs:element name="UnTrackError" type="xs:float" />
    							<xs:element name="TrackError" type="xs:float" />
    							<xs:element name="Un_V_95" type="xs:float" />
    							<xs:element name="Un_V_99" type="xs:float" />
    							<xs:element name="S_V_95" type="xs:float" />
    							<xs:element name="S_V_99" type="xs:float" />
    							<xs:element name="SFact" type="xs:float" />
    							<xs:element name="PValue" type="xs:float" />
    							<xs:element name="ConMBAgent" type="xs:float" />
    						</xs:sequence>
    					</xs:complexType>
    				</xs:element>
    				<xs:element name="FContribTV">
    					<xs:complexType>
    						<xs:sequence>
    							<xs:element name="Fact" type="xs:string" />
    							<xs:element name="VContrib" type="xs:float" />
    							<xs:element name="Multiplier" type="xs:float" />
    							<xs:element name="MVTE" type="xs:float" />
    						</xs:sequence>
    					</xs:complexType>
    				</xs:element>
    			</xs:sequence>
    		</xs:complexType>
    	</xs:element>
    </xs:schema>'
    

    Give them a try!

     


    • Edited by wBob Friday, September 2, 2011 9:48 AM
    Friday, September 2, 2011 9:45 AM