locked
Best Way to Shred XML document and insert the result set into a table in SQL Server RRS feed

  • Question

  • What is the best way to shred an XML document? 

    xPath or FLOWR? .....

    I want to insert from XML document to a SQL Server table and automate same....

    Performance is my main concern?  It is killing me bros.... pls do help me on this....

    Assume the XML document is about 5 MB and the xml has n depth like

    <clusters>
     <cluster id="">
      <name> xyz</name>
      <cluster>
       <cluster>
        <cluster id= "1,2,3,4,5,">
         <name> xyz</name>
         <cluster>
         </cluster>
        </cluster>
       </cluster>
      </cluster>
     </cluster>
     <cluster id="">
      <name> xyz</name>
      <cluster>
       <cluster>
        <cluster id= "1,2,3,4,5,">
         <name> xyz</name>
         <cluster>
         </cluster>
        </cluster>
       </cluster>
      </cluster>
     </cluster>

    </clusters>

     


    ebro

    Friday, July 26, 2013 2:30 PM

Answers

  • Hi,

    As explained here already on how to shred your XML - http://gallery.technet.microsoft.com/scriptcenter/SQL-Server-ImportShred-XML-7f3baa40/view/Discussions#content

    As far as my knowledge , you can shred XML into table using below methods

    1.) Using OPENXML or nodes() - http://www.mssqltips.com/sqlservertip/1609/replacing-openxml-with-the-xml-nodes-function-in-sql-server-2005/

    2.) SSIS - XML Source task - http://blogs.msdn.com/b/mattm/archive/2007/12/11/using-xml-source.aspx

    3.) Powershell  - http://sathyadb.blogspot.in/2013/03/reading-xml-file-using-powershell.html

    4.)Using VB/C# script (Script task in SSIS) http://technet.microsoft.com/en-us/magazine/2008.10.heyscriptingguy.aspx

    Here are few links for understanding OPENXML and XQUERY Optimisation tips :

    http://blog.wharton.com.au/2011/05/28/part-3-xquery-please-dont-use-openxml/

    http://blog.wharton.com.au/2011/06/13/part-5-openxml-and-xquery-optimisation-tips/

    I haven't answered your question but i have given you many choices ,try it out and choose best suitable method that gives better performance :)


    sathya --------- Mark as answered if my post solved your problem and Vote as helpful if my post was useful.

    • Marked as answer by ebrolove Monday, July 29, 2013 3:07 PM
    Saturday, July 27, 2013 1:29 PM
  • One of the fastest ways to load xml, particularly larger files, is using the SQL XML Bulkload API.  This can also use recursive relationships.  Follow this example:

    Create a file: c:\temp\temp.xml

    <clusters>
    	<cluster id="clusterA">
    		<name>folder1</name>
    		<cluster id ="016031495_00001, 00011700_00000">
    			<cluster id ="016031495_00002, 00011700_00002">
    				<name>fold1</name>
    			</cluster>
    			<cluster id="116031495_00001, 10011700_00000">
    				<name>fold2</name>
    			</cluster>
    			<cluster id="216031495_00001, 20011700_00000">
    				<name>fold3</name>
    			</cluster>
    			<cluster id="316031495_00001, 30011700_00000">
    				<name>fold4</name>
    			</cluster>
    
    			<cluster id="116031495_00003, 10011700_00000">
    				<name>fold2</name>
    			</cluster>
    			<cluster id="216031495_00002, 20011700_00000">
    				<name>fold3</name>
    			</cluster>
    			<cluster id="316031495_00002, 30011700_00000">
    				<name>fold4</name>
    			</cluster>
    		</cluster>
    	</cluster>
    
    	<cluster id="clusterB">
    		<name>folder1</name>
    		<cluster id ="016031495_00003, 00011700_00000">
    			<name>fold1</name>
    		</cluster>
    		<cluster id="116031495_00004, 10011700_00000">
    			<name>fold2</name>
    		</cluster>
    		<cluster id="216031495_00003, 20011700_00000">
    			<name>fold3</name>
    		</cluster>
    		<cluster id="316031495_00003, 30011700_00000">
    			<name>fold4</name>
    		</cluster>
    	</cluster>
    	
    </clusters>

    Create a file: c:\temp\temp.xsd

    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    	<xsd:annotation>
    		<xsd:appinfo>
    			<sql:relationship name="parentClusterToCluster"
    						parent="cluster"
    						parent-key="id"
    						child="cluster"
    						child-key="parentId" />
    		</xsd:appinfo>
    	</xsd:annotation>
    
    	<xsd:element name="cluster"
    					 type="clusterType"
    					 sql:relation="cluster"
    					 sql:key-fields="id"
    					 sql:limit-field="parentId" />
    	<xsd:complexType name="clusterType">
    		<xsd:sequence>
    			<xsd:element name="cluster"
    						 type="clusterType"
    						 sql:relation="cluster"
    						 sql:key-fields="id"
    						 sql:relationship="parentClusterToCluster"
    						 sql:max-depth="6"/>
    			<xsd:element name="name" />
    		</xsd:sequence>
    
    		<xsd:attribute name="id" type="xsd:string" />
    		<xsd:attribute name="name" type="xsd:string"/>
    
    		<xsd:attribute name="parentId" type="xsd:string" />
    	</xsd:complexType>
    
    </xsd:schema>

    Create a file: c:\temp\run and gen.vbs

    Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0")
    objBL.ConnectionString = "provider=SQLOLEDB;data source=.\sql2008r2;database=tempdb;integrated security=SSPI"
    objBL.ErrorLogFile = "error.log"
    objBL.SchemaGen = True
    objBL.SGDropTables = True
    
    objBL.Execute "test.xsd", "test.xml"
    Set objBL = Nothing

    Set the data source text to your SQL Server name. In my example, my SQL Server name is SQL2008R2.  Now, run the VBScript file by double-clicking it in Windows Explorer. (NB you could also code this in .net.)

    I get these results:

    See if you can adapt this example to your real data.

    • Marked as answer by ebrolove Monday, July 29, 2013 3:08 PM
    Sunday, July 28, 2013 6:34 PM
    Answerer

All replies

  • use OPENXML , see the below example

    DECLARE @idoc int, @doc varchar(1000); 
    SET @doc ='
    <ROOT>
    <Customer CustomerID="VINET" ContactName="Paul Henriot">
       <Order OrderID="10248" CustomerID="VINET" EmployeeID="5" 
               OrderDate="1996-07-04T00:00:00">
          <OrderDetail ProductID="11" Quantity="12"/>
          <OrderDetail ProductID="42" Quantity="10"/>
       </Order>
    </Customer>
    <Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">v
       <Order OrderID="10283" CustomerID="LILAS" EmployeeID="3" 
               OrderDate="1996-08-16T00:00:00">
          <OrderDetail ProductID="72" Quantity="3"/>
       </Order>
    </Customer>
    </ROOT>'; 
    
    --Create an internal representation of the XML document.
    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc; 
    
    -- SELECT stmt using OPENXML rowset provider
    SELECT *
    FROM   OPENXML (@idoc, '/ROOT/Customer/Order/OrderDetail',2) 
             WITH (OrderID       int         '../@OrderID', 
                   CustomerID  varchar(10) '../@CustomerID', 
                   OrderDate   datetime    '../@OrderDate', 
                   ProdID      int         '@ProductID', 
                   Qty         int         '@Quantity');


    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    • Proposed as answer by Prajesh Friday, July 26, 2013 3:22 PM
    Friday, July 26, 2013 3:22 PM
  • What structure do you want to shred this into?


    Russel Loski, MCT, MCSA SQL Server 2012, 2008, MCITP Business Intelligence Developer and Database Developer 2008 Twitter: @sqlmovers; blog: www.sqlmovers.com

    Friday, July 26, 2013 5:17 PM
  • @Prajesh

    Can you use my xml above. I see the one you posted on books online as well. And I guess the xml structure of the one I posted above and the one in the BOL is different. I may be mistaken on this and you may correct me. I do not know if the nature of the xml structure would have any bearing on the XML Path solution.

    @Russ

    I do not get you body. Perhaps you may give an example for every possible structure that I assume would not be more than 3 structure.

    Thanks


    ebro


    • Edited by ebrolove Friday, July 26, 2013 7:02 PM revise
    Friday, July 26, 2013 7:00 PM
  • DECLARE @idoc int, @doc varchar(1000); 
    SET @doc ='
    <clusters>
     <cluster id="">
      <name> xyz</name>
      <cluster>
       <cluster>
        <cluster id= "1,2,3,4,5,">
         <name> xyz</name>
         <cluster>
         </cluster>
        </cluster>
       </cluster>
      </cluster>
     </cluster>
     <cluster id="">
      <name> xyz</name>
      <cluster>
       <cluster>
        <cluster id= "1,2,3,4,5,">
         <name> xyz</name>
         <cluster>
         </cluster>
        </cluster>
       </cluster>
      </cluster>
     </cluster>
    </clusters>'; 
    --Create an internal representation of the XML document.
    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc; 
    SELECT * FROM   OPENXML (@idoc, '/clusters/cluster',2)  WITH (name       varchar(100)         './@id');
    SELECT * FROM   OPENXML (@idoc, '/clusters/cluster',2)  WITH (name       varchar(100)         './name');
    SELECT * FROM   OPENXML (@idoc, '/clusters/cluster/cluster/cluster/cluster',2)  WITH (name       varchar(100)         './@id');

    You can insert these values in any sql db tables as per your requirements.

    Do let me know if you have further questions ?


    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.



    • Proposed as answer by Prajesh Friday, July 26, 2013 7:09 PM
    • Edited by Prajesh Friday, July 26, 2013 7:10 PM better
    Friday, July 26, 2013 7:03 PM
  • Can you use my xml above. I see the one you posted on books online as well. And I guess the xml structure of the one I posted above and the one in the BOL is different. I may be mistaken on this and you may correct me. I do not know if the nature of the xml structure would have any bearing on the XML Path solution.

    ebro

    What can I say?  It has a bearing in that I need to know what table like structure you want.  It is like saying that you need a query on the AdventureWorks database.  End of statement.  That is ridiculous.  I need to know what it is that you want out of the AdventureWorks database.


    Russel Loski, MCT, MCSA SQL Server 2012, 2008, MCITP Business Intelligence Developer and Database Developer 2008 Twitter: @sqlmovers; blog: www.sqlmovers.com

    Friday, July 26, 2013 7:03 PM
  • @Russ,

    Sorry sir, if I am being ridiculous. I am a stranger in XML and do not even know the language of the subject.

    Any way, know that I am very appreciative of the clues I am getting from you and that I am indebted to you all.

    Respect to Masters/Seniors

    ebro

    @Pra,

    Thank you body. I will try that solution and see if that would solve mu issue.

     

    Respect to Masters/Seniors

    ebro

      


    ebro

    Friday, July 26, 2013 7:51 PM
  • Hi,

    As explained here already on how to shred your XML - http://gallery.technet.microsoft.com/scriptcenter/SQL-Server-ImportShred-XML-7f3baa40/view/Discussions#content

    As far as my knowledge , you can shred XML into table using below methods

    1.) Using OPENXML or nodes() - http://www.mssqltips.com/sqlservertip/1609/replacing-openxml-with-the-xml-nodes-function-in-sql-server-2005/

    2.) SSIS - XML Source task - http://blogs.msdn.com/b/mattm/archive/2007/12/11/using-xml-source.aspx

    3.) Powershell  - http://sathyadb.blogspot.in/2013/03/reading-xml-file-using-powershell.html

    4.)Using VB/C# script (Script task in SSIS) http://technet.microsoft.com/en-us/magazine/2008.10.heyscriptingguy.aspx

    Here are few links for understanding OPENXML and XQUERY Optimisation tips :

    http://blog.wharton.com.au/2011/05/28/part-3-xquery-please-dont-use-openxml/

    http://blog.wharton.com.au/2011/06/13/part-5-openxml-and-xquery-optimisation-tips/

    I haven't answered your question but i have given you many choices ,try it out and choose best suitable method that gives better performance :)


    sathya --------- Mark as answered if my post solved your problem and Vote as helpful if my post was useful.

    • Marked as answer by ebrolove Monday, July 29, 2013 3:07 PM
    Saturday, July 27, 2013 1:29 PM
  • One of the fastest ways to load xml, particularly larger files, is using the SQL XML Bulkload API.  This can also use recursive relationships.  Follow this example:

    Create a file: c:\temp\temp.xml

    <clusters>
    	<cluster id="clusterA">
    		<name>folder1</name>
    		<cluster id ="016031495_00001, 00011700_00000">
    			<cluster id ="016031495_00002, 00011700_00002">
    				<name>fold1</name>
    			</cluster>
    			<cluster id="116031495_00001, 10011700_00000">
    				<name>fold2</name>
    			</cluster>
    			<cluster id="216031495_00001, 20011700_00000">
    				<name>fold3</name>
    			</cluster>
    			<cluster id="316031495_00001, 30011700_00000">
    				<name>fold4</name>
    			</cluster>
    
    			<cluster id="116031495_00003, 10011700_00000">
    				<name>fold2</name>
    			</cluster>
    			<cluster id="216031495_00002, 20011700_00000">
    				<name>fold3</name>
    			</cluster>
    			<cluster id="316031495_00002, 30011700_00000">
    				<name>fold4</name>
    			</cluster>
    		</cluster>
    	</cluster>
    
    	<cluster id="clusterB">
    		<name>folder1</name>
    		<cluster id ="016031495_00003, 00011700_00000">
    			<name>fold1</name>
    		</cluster>
    		<cluster id="116031495_00004, 10011700_00000">
    			<name>fold2</name>
    		</cluster>
    		<cluster id="216031495_00003, 20011700_00000">
    			<name>fold3</name>
    		</cluster>
    		<cluster id="316031495_00003, 30011700_00000">
    			<name>fold4</name>
    		</cluster>
    	</cluster>
    	
    </clusters>

    Create a file: c:\temp\temp.xsd

    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    	<xsd:annotation>
    		<xsd:appinfo>
    			<sql:relationship name="parentClusterToCluster"
    						parent="cluster"
    						parent-key="id"
    						child="cluster"
    						child-key="parentId" />
    		</xsd:appinfo>
    	</xsd:annotation>
    
    	<xsd:element name="cluster"
    					 type="clusterType"
    					 sql:relation="cluster"
    					 sql:key-fields="id"
    					 sql:limit-field="parentId" />
    	<xsd:complexType name="clusterType">
    		<xsd:sequence>
    			<xsd:element name="cluster"
    						 type="clusterType"
    						 sql:relation="cluster"
    						 sql:key-fields="id"
    						 sql:relationship="parentClusterToCluster"
    						 sql:max-depth="6"/>
    			<xsd:element name="name" />
    		</xsd:sequence>
    
    		<xsd:attribute name="id" type="xsd:string" />
    		<xsd:attribute name="name" type="xsd:string"/>
    
    		<xsd:attribute name="parentId" type="xsd:string" />
    	</xsd:complexType>
    
    </xsd:schema>

    Create a file: c:\temp\run and gen.vbs

    Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0")
    objBL.ConnectionString = "provider=SQLOLEDB;data source=.\sql2008r2;database=tempdb;integrated security=SSPI"
    objBL.ErrorLogFile = "error.log"
    objBL.SchemaGen = True
    objBL.SGDropTables = True
    
    objBL.Execute "test.xsd", "test.xml"
    Set objBL = Nothing

    Set the data source text to your SQL Server name. In my example, my SQL Server name is SQL2008R2.  Now, run the VBScript file by double-clicking it in Windows Explorer. (NB you could also code this in .net.)

    I get these results:

    See if you can adapt this example to your real data.

    • Marked as answer by ebrolove Monday, July 29, 2013 3:08 PM
    Sunday, July 28, 2013 6:34 PM
    Answerer
  • Unbelievable!

    This is simply fabulous. No words sir. You know the language of the subject and the substance too. I will update you my situation. 

    WITH THE GREATEST REVERENCE,

    ebro


    ebro

    Monday, July 29, 2013 3:11 PM