locked
Suggestion for giant table RRS feed

  • Question

  • Greetings,

    Earlier I posted a questions requesting suggestions in T-SQL forum, looks like this one is right forum to get correct suggestions. I wonder nobody directed or moved the post to this one.

    Anyways the link to the earlier post is here...http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/77ab0d80-ec61-4d8f-8b06-b23826972955

    Summary...

    I need to gather customer data for web application, the read vs right ratio is 90 : 10. Query on customer table could be on many columns and in any case we have to return almost all the columns.

    I need suggestions in regards to following.. (as these are the primary areas where performance get affected.)
        1.    Database Design
        2.    Distribution of Files and File Groups
        3.    Indexing ideas (Cluster & non cluster)
        4.    Table Partition (If this can help, how partition function should be?)
        5.    Full Text Search (Will it use the indexes efficiently)

    I hope we have people who have handelled this earlier and could provide some guides.

    -

    Thursday, July 29, 2010 7:12 AM

Answers

  • SA, Full text index is for large data type columns like text/varchar (max) data type.  If you have large data type columns then you can create “full text index” on the table.  It definitely improves query performance if your index planning is correct.

    Please check the link for more details on Full text search:

    http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

    For Full-Text Search Architecture check this:

    http://msdn.microsoft.com/en-us/library/ms142541.aspx

    Thanks,

    Sandeep

    Thursday, July 29, 2010 12:38 PM
  • So it seems like you have a pretty straightforward need in this basic information:

    CustomerID,

    FirstName,

    MiddleName,

    LastName,

    Gender,

    IDProofType,

    IDProofNumber,

    DateOfBirth,

    AddressLine1,

    AddressLine2,

    AddressStreet,

    AddressCity,

    AddressCounty,

    AddressState,

    AddressCountry,

    AddressZip,

    Phone

    In the other forum, they noted that you need to break out tables based on cardinality (number of related values) so additional tables for address, phone, and possibly even name (not likely in your case, but it could be.)

    So you claim "billions" of rows, but that seems pretty unlikely, unless you are doing a new twitter or something along these lines.  But even a billion is quite a number.  10 percent change of the database is quite a large amount when you are talking billions, so the difference between creating, deleting, and updating is very important.  Uri is 100% right about getting an expert.  If you are truly dealing with billions of rows, you should be able to hire someone for a few weeks to help out. But I know that isn't always the case. 90/10 read write is interesting, but how much growth you expect to have is another question.  How many rows read and written a day.

    >> 2.    Distribution of Files and File Groups<<

    Logs on their own drive, everything else on a nice RAID 10  (most likely) array and then start tuning your queries on your test sets. Once you have tuned things with a few users, do performance tuning. Use the DMV's to determine what tables and indexes are heavily used, and then add filegroups and drives until you get the performance you need based on expected utilization. 

    Glenn Berry (primarily) wrote a nice free ebook on DMVs that can help you figure things out about your disks.  You can pick it up here: http://www.red-gate.com/products/SQL_Response/offers/dmv.htm

    Look at these sections:

    DMV#15: Investigate Disk Bottlenecks via I/O Stalls
    DMV#16: Investigate Disk Bottlenecks via Pending I/O

    This will help you find out if your disk is being overused.  If so, you can use queries from this section:

     DMV#12: Interrogate Index Usage

    To find what tables/indexes are being heavily utilized/modified.


    Louis

    Friday, July 30, 2010 3:43 AM
  • SA

    It is really open-end question. People write books about. I think you need to hire an expert sitting with you at the site and getting all needed data to make a right decision


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 8:22 AM
  • 1) Partitioning is all about manageability not for performance, I mean   you can benefit from  having partition (performance) but it depends on your business requirements

    If users specify a range that needs one partion that performance will be good, however , I have seen env, where people implement partion but users serach for all ranges and there were not performance gain

    2) It depends on what  you are trying to achive

     


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 10:30 AM

All replies

  • SA

    It is really open-end question. People write books about. I think you need to hire an expert sitting with you at the site and getting all needed data to make a right decision


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 8:22 AM
  • SA

    It is really open-end question. People write books about. I think you need to hire an expert sitting with you at the site and getting all needed data to make a right decision


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 8:22 AM
  • Thanks Uri,

    Yes, I agree that this question is open ended and everyone with come up with its own design. But I am looking opinion from experts...Since we have lots of I thought they will put some lights that will make us go with right design.

    Atleast I am expecting the answerers to following questions..

    Will Partitioning help the performance? if yes what should be criteria for Partition Function?
    Will Full Text Search be helpful? if decided to use this one, will indexes required? etc..

    -

    Thursday, July 29, 2010 9:42 AM
  • 1) Partitioning is all about manageability not for performance, I mean   you can benefit from  having partition (performance) but it depends on your business requirements

    If users specify a range that needs one partion that performance will be good, however , I have seen env, where people implement partion but users serach for all ranges and there were not performance gain

    2) It depends on what  you are trying to achive

     


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 10:30 AM
  • Okay.. that means in my scenario, partition is not much beneficent. unless user are searching on specific region.

    Since all columns are searchable can full text search will take over creating more non clustered indexes...if used CONTAINS or FREETEXT function, should I need to create specif indexes?

    -

    Thursday, July 29, 2010 11:15 AM
  • Hi

    I have not used FTS for long time but AFAIK , FTS creates its onw indexes to optimize searching


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 12:36 PM
  • Hi

    I have not used FTS for long time but AFAIK , FTS creates its onw indexes to optimize searching


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, July 29, 2010 12:36 PM
  • SA, Full text index is for large data type columns like text/varchar (max) data type.  If you have large data type columns then you can create “full text index” on the table.  It definitely improves query performance if your index planning is correct.

    Please check the link for more details on Full text search:

    http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

    For Full-Text Search Architecture check this:

    http://msdn.microsoft.com/en-us/library/ms142541.aspx

    Thanks,

    Sandeep

    Thursday, July 29, 2010 12:38 PM
  • So it seems like you have a pretty straightforward need in this basic information:

    CustomerID,

    FirstName,

    MiddleName,

    LastName,

    Gender,

    IDProofType,

    IDProofNumber,

    DateOfBirth,

    AddressLine1,

    AddressLine2,

    AddressStreet,

    AddressCity,

    AddressCounty,

    AddressState,

    AddressCountry,

    AddressZip,

    Phone

    In the other forum, they noted that you need to break out tables based on cardinality (number of related values) so additional tables for address, phone, and possibly even name (not likely in your case, but it could be.)

    So you claim "billions" of rows, but that seems pretty unlikely, unless you are doing a new twitter or something along these lines.  But even a billion is quite a number.  10 percent change of the database is quite a large amount when you are talking billions, so the difference between creating, deleting, and updating is very important.  Uri is 100% right about getting an expert.  If you are truly dealing with billions of rows, you should be able to hire someone for a few weeks to help out. But I know that isn't always the case. 90/10 read write is interesting, but how much growth you expect to have is another question.  How many rows read and written a day.

    >> 2.    Distribution of Files and File Groups<<

    Logs on their own drive, everything else on a nice RAID 10  (most likely) array and then start tuning your queries on your test sets. Once you have tuned things with a few users, do performance tuning. Use the DMV's to determine what tables and indexes are heavily used, and then add filegroups and drives until you get the performance you need based on expected utilization. 

    Glenn Berry (primarily) wrote a nice free ebook on DMVs that can help you figure things out about your disks.  You can pick it up here: http://www.red-gate.com/products/SQL_Response/offers/dmv.htm

    Look at these sections:

    DMV#15: Investigate Disk Bottlenecks via I/O Stalls
    DMV#16: Investigate Disk Bottlenecks via Pending I/O

    This will help you find out if your disk is being overused.  If so, you can use queries from this section:

     DMV#12: Interrogate Index Usage

    To find what tables/indexes are being heavily utilized/modified.


    Louis

    Friday, July 30, 2010 3:43 AM