locked
select performance for very large tables RRS feed

  • Question

  • In my data base I generate a huge amount of data as an output from scientific experiments. Some tables contain hundreds of millions of rows and require tens of gigabytes of disk space. My selections, (which include some views and joins and table valued functions) are awfully slow sometimes. I need to improve performance. I do have some mitigating circumstances:


    1. I am the only user of the database, and although every now and then I run concurrent selections from different applications - I can control it

    2. I therefore also don’t care about security
    3. The data is almost never updated or deleted - only inserted and selected and tables may be dropped
    4. The insertions are run with BCP and rarely at the same time as the selections
    5. I don't care about backups - roll backs etc... it’s cheap to rerun these experiments and generate equivalent data - so no catastrophe if a table gets corrupted

    6. The selections are run from a C# application which does further processing on the data. I can leave it all night to run (I sometimes do…)

     

    My question:

     

    Given the above scenario – what are the tweaks I can do in the server, database, tables, indexes, etc..  in order to improve performance?

    I’m looking for advice along the lines of

    “Use simple mode”

    “Rebuild your indexes every time you add new data to a table”

    Etc…

     

    Tuesday, February 2, 2010 2:44 PM

Answers

  • To increase performance, you need to reduce the number of pages to be read to satisfy your query.

    Consider partitioning your tables, especially if you normally select for a given time period.
    You should probably create a clustered index on your tables so that they can be read in "order."
    You should then look at creating non-clustered indexes to satisfy the query.

    For instance, if you have a wide table that is used in the middle of a join, but none of the columns are present in the SELECT statement, you will want to create a non-clustered index on only the join columns, as opposed to relying on the clustered index.  Since the clustered index stores all of the data for the table, it will create many, many more page reads than a non-clustered index would on the join columns.

    Keep your join columns lightweight - ie. integers if possible.

    Don't use functions on the left side of a join/constraint.  ie. don't do: substring(A.[Code],3,4) = B.CodeType.  Instead, for that example, consider creating a computed column that stores the substring result, then create an index on that column.  "LEFT([Name],1) = A" can be rewritten as "[Name] like 'A%'"  You get the idea.

    Ensure the columns you are joining together have the same data type.

    That should get you started.

    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Proposed as answer by SandeepM_ Wednesday, February 3, 2010 7:56 AM
    • Marked as answer by Brian TkatchEditor Tuesday, February 28, 2012 9:09 PM
    Tuesday, February 2, 2010 3:21 PM

All replies

  • Howsabout ask this in the right forum? :)

    Methinks the SQL Server Database Engine is a much better forum for this question. They are quite good there.
    Tuesday, February 2, 2010 3:12 PM
    Answerer
  • To increase performance, you need to reduce the number of pages to be read to satisfy your query.

    Consider partitioning your tables, especially if you normally select for a given time period.
    You should probably create a clustered index on your tables so that they can be read in "order."
    You should then look at creating non-clustered indexes to satisfy the query.

    For instance, if you have a wide table that is used in the middle of a join, but none of the columns are present in the SELECT statement, you will want to create a non-clustered index on only the join columns, as opposed to relying on the clustered index.  Since the clustered index stores all of the data for the table, it will create many, many more page reads than a non-clustered index would on the join columns.

    Keep your join columns lightweight - ie. integers if possible.

    Don't use functions on the left side of a join/constraint.  ie. don't do: substring(A.[Code],3,4) = B.CodeType.  Instead, for that example, consider creating a computed column that stores the substring result, then create an index on that column.  "LEFT([Name],1) = A" can be rewritten as "[Name] like 'A%'"  You get the idea.

    Ensure the columns you are joining together have the same data type.

    That should get you started.

    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Proposed as answer by SandeepM_ Wednesday, February 3, 2010 7:56 AM
    • Marked as answer by Brian TkatchEditor Tuesday, February 28, 2012 9:09 PM
    Tuesday, February 2, 2010 3:21 PM
  • Howsabout ask this in the right forum? :)

    Methinks the SQL Server Database Engine is a much better forum for this question. They are quite good there.

    Hmm...  It seems that this could also be a database design question.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Tuesday, February 2, 2010 3:22 PM
  • I totally agree with Phil to make it partition of Table to make it possible.

    Adding to Phil we can also redesign the Table as Phil suggested.

     

    I have also few other suggestion adding to Phil.

     

    1.       If you have Text type column try to Implement iFTS Index

    2.       If XML Column present you can add XML Index

    3.       If non index columns are in filter condition of where we can use covering Index

    4.       Make proper Join instead of IN OR NOT IN clause

    5.       If you don’t want to add more Index then you can use included columns which will give you more benefit in performance.

     

    Finally, I can suggest it need to be thoroughly analyzed to improve the performance of this Table and implement it very carefully.
     

    Thanks,

    Sandeep

    Wednesday, February 3, 2010 7:56 AM
  • Hmm.. this isn't a question about models. Its about how to setup the engine, isn't it?
    Wednesday, February 3, 2010 2:16 PM
    Answerer
  • Thanks Phil

    "Consider partitioning your tables, especially if you normally select for a given time period."

    What should be the criteria for the partioninig?
    Sunday, February 7, 2010 8:01 AM
  • Brian,

    I took your advice and posted it on the Engine forum as well - and indeed got additional relevant stuff
    But frankly - if somebody gave me ideas on how to tweak my disk controller - I wouldn't think it out of place either
    At the end of the day I want to solve the problem - and any idea that's usefull - is a good one no matter how out of place and out of the box it is...
    Sunday, February 7, 2010 8:07 AM