none
Performance Impact of SqlDataReader with a single SqlConnection RRS feed

  • Question

  •  

    Hi All,

     

    I've been tasked with removing data from live database systems and loading it into a local database so we can create a MOLAP cube.

     

    My original plan was to create a single SqlConnection to the remote database, query the remote tables (sequentially) and read 10000 records at a time into a DataTable that I then bulk load into the local DB.

     

    code:

    cmd.CommandText = "SELECT * FROM TABLE";

    int rowCount = 0;

    reader = cmd.ExecuteReader()

    while (reader.read())

    {

    load DataTable with values from row

    if ((++rowCount % 10000) == 0)

    {

    bulk load to local table

    clear the DataTable

    }

    }

    bulkLoad any remaining rows.

     

    So far so good?

     

     

    Here are some simple questions:

     

    1) Performancewise, is it better to have one and only one SqlConnection to the remote host, or is it better to create a new connection for each SELECT statement.

    2) When I execute the SQL query through ExecuteReader and read one row at a time, where is the memory being consumed ... on the remote DB or locally? Or, in other words, once I issue the ExecuteReader, does the remote DB perform the entire query and store the results on the remote server? Or do the results get stored on the local machine since that's where the reader variable exists?

     

     

    Now let me throw a wrench in the works.

     

    I also have a class that monitors various PerfMon counters on the remote system and if it detects that certain thresholds have been crossed it needs to pause the data from loading from the remote database.

     

    Some tougher questions:

     Lets say that midway through the loading of one table I notice that a PerfMon threshold has been crossed and need to pause processing. If I stop the mass exporting and loading midway and Sleep until the PerfMon counters are within an exceptable range,

     

    1) What are the implications (based on performance) of leaving the reader open while I sleep as opposed to bringing it down and reopening it again later.  If I leave it open will it negatively impact the remote DB? local DB? both? neither?

    2) what about leaving the SqlConnection open?

     

     

    Any help in this matter or advice on other methods to approach this problem would be greatly appreciated.

     

    Jake

    Thursday, August 14, 2008 7:28 PM

Answers

  • 1 - You should have only one SqlConnection; there is no benefit on closing and reopen.

    2 - but the performance impact of closing and reopening, would be relatively low, if you are using connection pooling (as is by default)

     

    For your scenario (no multiple, concurrent connections), it shouldn't make a difference

     

    The storage of the results depends on the cursor type selected; and also, you should use SequentialAccess on your reader, since you are doing a "single pass" for reading the results. This also prevents storage to be "kept" in memory after it is consumed.

     

    An additional suggestion: change your 10000 to a variable, so you can tune your application to the optimal number of rows, which would vary depending on your client/server RAM, processor speed, network and other applications running.

     

    Finally, you could consider the alternative of using SSIS, instead of writing your own bulk copy methods. But the approach you are following is in general, the recommended pattern when you write the code yourself.

     

    Bruno

    Monday, August 18, 2008 11:33 PM

All replies

  • This would likely be much faster if you tried to separate the remove and copy logic.  Truncate table is a very fast way to remove data in a table and SQLBulkCopy works best when it has large streams of data to operate on.  Have you tried synchronizing your project more with the example for SQLBulkCopy?

    http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

    It has something like:
    bulkCopy.WriteToServer(reader);
    For the actual bulk copy event.

    Thanks,

    John
    Thursday, August 14, 2008 10:00 PM
  • 1 - You should have only one SqlConnection; there is no benefit on closing and reopen.

    2 - but the performance impact of closing and reopening, would be relatively low, if you are using connection pooling (as is by default)

     

    For your scenario (no multiple, concurrent connections), it shouldn't make a difference

     

    The storage of the results depends on the cursor type selected; and also, you should use SequentialAccess on your reader, since you are doing a "single pass" for reading the results. This also prevents storage to be "kept" in memory after it is consumed.

     

    An additional suggestion: change your 10000 to a variable, so you can tune your application to the optimal number of rows, which would vary depending on your client/server RAM, processor speed, network and other applications running.

     

    Finally, you could consider the alternative of using SSIS, instead of writing your own bulk copy methods. But the approach you are following is in general, the recommended pattern when you write the code yourself.

     

    Bruno

    Monday, August 18, 2008 11:33 PM