locked
SqlBulkCopy RRS feed

  • Question

  • I have a query that dumps data from one table and puts it into another.

    SELECT a.PartNumID AS RemedyPartNumID, a.UnitNumID AS RemedyUnitNumID,  PerformanceAll.*
    INTO RemedyPerformance a
    FROM RemedyParts b INNER JOIN PartPerformanceNeeds c
    ON b.PartNumID = c.BOPartNumID
    INNER JOIN SimPerformanceAll d ON b.Sim = d.Sim
    AND c.PartNumID = d.PartNumID
    AND c.UnitNumID = d.UnitNumID

    The issue is is this table can have up to 400+ million records.

    Is there a faster way. I was thinking of SqlBulkCopy.


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, July 16, 2009 10:28 PM

Answers

  • At the very least, using INSERT INTO or a bulk insert technique will allow you to insert the data in chunks instead of doing all 400 million as an all-or-nothing operation...
    • Marked as answer by JohnGrove Friday, July 17, 2009 12:34 AM
    Thursday, July 16, 2009 11:56 PM

All replies

  • Haven't tried this yet, but I saw an example in ADO.NET by Sahil Malik which I customized.

    //Example (in VB.NET)

    Dim sb As New StringBuilder()
    sb.Append("SELECT a.PartNumID AS RemedyPartNumID, a.UnitNumID AS RemedyUnitNumID,  PerformanceAll.* ")
    sb.Append("FROM RemedyParts b INNER JOIN PartPerformanceNeeds c ")
    sb.Append("ON b.PartNumID = c.BOPartNumID ")
    sb.Append("INNER JOIN SimPerformanceAll d ON b.Sim = d.Sim ")
    sb.Append("AND c.PartNumID = d.PartNumID ")
    sb.Append("AND c.UnitNumID = d.UnitNumID")

    Using firstConnection As SqlConnection = New SqlConnection(connectionString)
       Dim cmd As SqlCommand = firstConnection.CreateCommand()
       cmd.CommandText = sb.ToString()
       Dim dr As SqlDataReader = cmd.ExecuteReader()
       Using secondConnection As SqlConnection(connectionString)
         Dim bc As SqlBulkCopy = New SqlBulkCopy(secondConnection)
         bc.DesinationTableName = "RemedyPerformace"
         bc.WriteToServer(dr)
         bc.Close()
         dr.Close()
       End Using
    End Using


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, July 16, 2009 11:02 PM
  • Don't know about bulk copy, but it has been my experience that SELECT INTO is quite fast.  Since it creates a new table where the undo operation involves just dropping the table, the transaction log can deal with it in a very efficient manner.  Contrast this to bulk operations which can be transaction log optimized only when the database is set to the bulk-logged recovery mode.

    One downside of SELECT INTO is that after doing the SELECT INTO it is natural to alter it to define a clustered PK next and that step can be slower than one would like.  Nonetheless overall performance can still be significantly better than INSERT INTO.



    Thursday, July 16, 2009 11:49 PM
  • Well to insert the 400+ million records is drastically slow. So, this is worth a try.
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, July 16, 2009 11:51 PM
  • At the very least, using INSERT INTO or a bulk insert technique will allow you to insert the data in chunks instead of doing all 400 million as an all-or-nothing operation...
    • Marked as answer by JohnGrove Friday, July 17, 2009 12:34 AM
    Thursday, July 16, 2009 11:56 PM
  • Thanks Jason. I am glad to see you are a moderator. You deserve it after all your years of devotion and helping others.
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Friday, July 17, 2009 12:35 AM
  • Thanks, John.  I'd like to hear how the performance of the .NET bulk copy you show stacks up against SELECT INTO.

    Also, be sure to see "Optimizing Bulk Insert Performance" and the numerous subtopics in the documentation: http://msdn.microsoft.com/en-us/library/ms190421.aspx
    Friday, July 17, 2009 2:37 AM
  • I'll let you know, the SELECT INTO in this particular scenario is slow. One of the main issues is whoever constructed this database didn't use ordinal numbers as primary keys but guids. So as you can imagine 400+ million rows where each primary key is some guid slows down regardless.

    I may also try SSIS if I can ever figure out how to use it.
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Friday, July 17, 2009 3:25 AM