locked
Best practice in table partitioning, seeking advice RRS feed

  • Question

  • Hello Gurus,

    I would like to ask for advice on how to do this table partitioning on my table.

    This table has Fulltext index applied and has the ff fields;

    ColId  numeric (18,0)

    CriteriaName varchar(500)

    Birthday DateTime

    Description varchar(1000)

    Now, i will search this table using the 2 fields CriteriaName and/or Birthday.

    What could be the best approach of indexing it and applying the partition on this very large table ( N million rows)  so that when i use the contains  it would be extremely fast.

    Thanks a million.

    Thursday, August 5, 2010 5:57 AM

Answers

  • When working with SQL Server against any database table, the index won't help at all if the query optimizer finds that you've selected more than 25% of the data pages. This might mean less than 5% of the rows in a table as each page can hold less than 8K of row data. This also means when the query executes, the QO builds a query plan based on the query input parameters and the statistics (and other factors). This same QP will be used for ALL subsequent queries unless it's forced out by recompile or age. If you've selected too many rows it tells SQL Server to scan the entire table and ignore the index as it would take longer to walk through the index than select the rows directly from the data table.

    Yes, segmenting the data can help and perhaps archiving data that's not fetched as often into other table(s).

    I discuss these concepts (and a lot more) in my Progressive workshop. See my blog for details.


    __________________________________________________________________
    William Vaughn
    Mentor, Consultant, Trainer, MVP
    http://betav.com
    http://betav.com/blog/billva
    http://www.hitchhikerguides.net

    “Hitchhiker’s Guide to Visual Studio and SQL Server (7th Edition)”

    Please click the Mark as Answer button if a post solves your problem!

    Sunday, August 8, 2010 9:49 PM
  • Kira,

    Of course all we can do on a support forum is respond to the information you give us.

    So, let me say a few general things.  First, today's servers are really indecently fast and powerful, and this means that a lot of jobs can be done with simple designs.  On the other hand, when an application reaches a certain size, you need someone who understands the technologies involved, and just asking questions on a forum may not be enough.  When applying for work in database, frequently one of the questions asked is, "What is the biggest database you have worked on?"  When the size of the database starts to exceed the size of the RAM by maybe 2x or more, or when you start getting much over about 10m rows in your largest tables, you are getting where you want some expertise on the project.  Oh, maybe 20m rows these days, as long as the application is fairly simple.

    But whether you're a newbie or a guru, it comes down to "try it!".  Do a proof of concept system, shouldn't take more than a few days or so to get hold of the full mass of data and try a few queries on it, to see.  If it runs in a minute or three, great, if not, come back to the forum, supply all the statistics you can think of that might apply, and see what advice you get.  If that still doesn't get you there, time to find a consultant you can get on the job.

    I find it interesting to find out just what you can do simply these days, and it really is quite a bit.

    Good luck!

    Josh

     

    Monday, August 9, 2010 8:35 PM

All replies

  • To speed up the query  you need proper defined indexes not partitioning

    How selective is CriteriasName?  Start with using NCI on (CriteriaName , Birthday)

    Also if you have another columns to be returned , INCLUDE them in the index ..

    CREATE

     

    NONCLUSTERED INDEX IX_indexname

    ON dbo.tbl(CriteriaName , Birthday)

    INCLUDE (Description);

    GO


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, August 5, 2010 6:44 AM
  • Thanks for the input, will it still be fast if am to search around 50k of records within a 5M data in 1 request.

    The requirement is that a given a Name or Name/Birthday, the process should find a match using contains (full text)  but should lists all results as if I used like, pattern, sounddex etc...

    Thank you

     

    Thursday, August 5, 2010 7:35 AM
  • I have no experience wuth FTS, do not know how does it optomize its indexes... But  do you really need 50k output? Too large isn't it?
    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Thursday, August 5, 2010 7:52 AM
  • actually its more than that,  its a list on a text file that will be uploaded on the server, then a list will be extracted with all the possible matches per row. That extract is expected within an hr or 2 max. thats why am asking if theres a way i can partition the table to search hopping that it will speed up the process or if not possible , an alternative approach of doing searches.

    just imagine per EACH row in that input file, I need to get its nearest possible duplicate.

    Ex.    the input files first row as set below will be used to search for  its duplicate on the database

              Column 1                            Column 2

              Mike Allen Gusto               Feb 29, 1930

    The result list should find the ff:

    1.      Mike Cruz ….

    2.      Mike villa …

    3.      Victor Allen …

    4.      Allenaida …

    5.      … Feb 10, 1930

    6.      … feb 29, 1930

    7.      Gustos Mickey …

    8.      Gusto, Allen Mike

    9.      Mike Gusto …

    10.  Gust Mike …

    11.  Mr. Gusto …

    12.    … February 29, 1930

    13.    … Feb. 1930

    14.    Mikel Gustos …

    Thanks a milllion!

    Thursday, August 5, 2010 9:27 AM
  • actually its more than that,  its a list on a text file that will be uploaded on the server, then a list will be extracted with all the possible matches per row. That extract is expected within an hr or 2 max.

    An HOUR?

    Don't worry!  SQL Server can search a couple of million rows in the slowest way possible, find 50k matches, and finish in a minute or two, no special design required.

    Will there be lots of these queries going on at once?  If so - make sure your server has lots of RAM and a couple of cores.  You can still probably get to 20m rows without partitioning.  Though, sure, I suppose partitioning by month is a simple step you could take even if you have only a couple of million rows, and want results in under a minute.

    (I presume FTS follows the same rules for partitioning as other queries, but like Uri, I don't really know.  Sounds like some kind of mailing list or marketing application with those 50k resultsets.)

    Hope that helps.

    Josh

    Sunday, August 8, 2010 6:27 PM
  • When working with SQL Server against any database table, the index won't help at all if the query optimizer finds that you've selected more than 25% of the data pages. This might mean less than 5% of the rows in a table as each page can hold less than 8K of row data. This also means when the query executes, the QO builds a query plan based on the query input parameters and the statistics (and other factors). This same QP will be used for ALL subsequent queries unless it's forced out by recompile or age. If you've selected too many rows it tells SQL Server to scan the entire table and ignore the index as it would take longer to walk through the index than select the rows directly from the data table.

    Yes, segmenting the data can help and perhaps archiving data that's not fetched as often into other table(s).

    I discuss these concepts (and a lot more) in my Progressive workshop. See my blog for details.


    __________________________________________________________________
    William Vaughn
    Mentor, Consultant, Trainer, MVP
    http://betav.com
    http://betav.com/blog/billva
    http://www.hitchhikerguides.net

    “Hitchhiker’s Guide to Visual Studio and SQL Server (7th Edition)”

    Please click the Mark as Answer button if a post solves your problem!

    Sunday, August 8, 2010 9:49 PM
  • Thanks gurus for your time in reading my posts, I find your replies very helpful that my issues can be resolved by just simple queries and table optimization. However, Am still lost on how to apply both your suggestions. Please do enlighten me on this.

    JRStern , you mentioned that my query request is possible under a few minutes. It would really help me if you could state your solution on how to directly address this. Rest assured about the server, its 6 x quad core, 32 gig ram and several hundred gigs of HDD (temporary only as the latest much powerful server is still under procurement), this will be for a Govt Social Security/Insurance Application Web site, so you could now picture how many millions of data/users will be giving traffic.

    Please advice. Thank you.

    Monday, August 9, 2010 5:27 AM
  • Kira,

    Of course all we can do on a support forum is respond to the information you give us.

    So, let me say a few general things.  First, today's servers are really indecently fast and powerful, and this means that a lot of jobs can be done with simple designs.  On the other hand, when an application reaches a certain size, you need someone who understands the technologies involved, and just asking questions on a forum may not be enough.  When applying for work in database, frequently one of the questions asked is, "What is the biggest database you have worked on?"  When the size of the database starts to exceed the size of the RAM by maybe 2x or more, or when you start getting much over about 10m rows in your largest tables, you are getting where you want some expertise on the project.  Oh, maybe 20m rows these days, as long as the application is fairly simple.

    But whether you're a newbie or a guru, it comes down to "try it!".  Do a proof of concept system, shouldn't take more than a few days or so to get hold of the full mass of data and try a few queries on it, to see.  If it runs in a minute or three, great, if not, come back to the forum, supply all the statistics you can think of that might apply, and see what advice you get.  If that still doesn't get you there, time to find a consultant you can get on the job.

    I find it interesting to find out just what you can do simply these days, and it really is quite a bit.

    Good luck!

    Josh

     

    Monday, August 9, 2010 8:35 PM