Proposed Answer Table Partitioning

  • Friday, November 25, 2011 8:52 AM
     
     

    Hi Everyone,

    We started a project in SQL Server 2008 it is a migration project from SQl server 2000. In 2000 customer have some tables and they are creating a 12 tables for a single year, for ex. they have a table say Tab_2010 now they are creating 12 tables monthly for this single table i.e. Tab_2010_01, Tab_2010_02.......Tab_2010_12 and they have the same data table for 30 Years on which they do the reporting i.e. Tab_1980, Tab_1981......Tab_2010. So the total number of tables for the same data are 30*12=360. Here when they need to pull the data for say Tab_2005 to Tab_2010 then they are doing union all of tables Tab_2005_01,Tab_2005_02.....Tab_2010_12, So performance issue. 

    Now they want to use the table level partitioning concept in sql server 2008. So what can be the best way to achieve this in Sql server 2008 i mean to say how we can keep the data so that there will be ease in maintenance and performance. Customer is open to hear any of the approach from us.

    Can we keep 10 years data in a single table with total 10*12=120 partitions and another 20 years data in archieved tables.

    Please post your suggestions as i worked more on development side not on Architech side.

     

    Thanks in advance

    Amit

All Replies

  • Friday, November 25, 2011 2:14 PM
     
     Proposed Answer

    Hi Tomar -

    Unless you have a really good reason for splitting the history off, I would suggest putting all the data into one table.  Mainly for simplicity in terms of querying, ETL design and maintenance.  Here are some reasons for this:

    For querying, if you are referencing 1 object, historical queries won't have to be tweaked if the desired date range lengthens.  And you won't have to use a union view to cover both tables for long-range queries, which should simplify things for the optimizer. 

    For ETL, having all in one table means you won't have to worry about rolling the oldest year off the current 10 year table and onto to archive table (can be done this way, but doesn't seem neccessary).  If your concern is in using your storage resources strategically across the years of data, you can do this with one partitioned table, by assigning the different partitions to different filegroups on different disks.  So you can keep the older data on cheaper/slower disks and focus your more performant storage on the more recent years (assuming those are queried more often).

    For maintenance, indexing is simplified and more efficient. You probably already see this benefit with having multiple tables, but the same will obviously apply to having 1 partitioned table.  No need to rebuild indexes on data that hasn't changed. One big upside with a single partitioned table is that if you need to add an index or change an existing one, you are creating/modifying 1 object on 1 table instead of juggling dozens across dozens of tables.  Same goes for adding columns or removing columns to the table.  Much simpler maintenance.

    Honestly, getting the partition functions, schemes, tables, indexes and ETL all set up will take a bit of work up front.  But long term, the simplicity should pay off.  And there are several good white papers on MSDN to provide guidance.  Check out the SQL2008 technical article titled "Partitioned Table and Index Strategies Using SQL Server 2008".  Also, Project REAL has plenty of sample code to demonstrate how a proper implementation looks and operates. 

    Hope that helps.


    Brent Greenwood, MS, MCITP, CBIP
  • Saturday, November 26, 2011 1:33 PM
     
     

    Hi Tomar

    I have similar environments and I deal with them effectively trough partitioning. You can also take advantage of table compression on 2008, at partition level, and also you can set the partition you do not change on readonly files, approching back up with partial, which gives you fantastic performance.

    I will definetly pack in a single partitioned table the live years, and the years to come; for what it concerns the managements of the old ones both startegies could be effective(you can merge them as well or keep in partitioned tables aside, depends if you can separate them in different logical approachs, justifying emplementing both choices); it is relevant in the decision the fact that you could have growing partitions, in terms of space and rows, meaning that your very old partition is unlike to be as big as the current one, so an evaluation on this is important, perapsh you can use different boundaries from the old to the new, and try to keep the partitions more or less on the same size. It's also important the way you spread your partition files on the disks, as the core can access a disk per time, so all the disks potentially carring other partitions will wait, decreasing SAN troughtput; so put a partition on a file group, and the spread the file group on many files across your phisical disks as much as you can, for best performance. There are some strange behaviours in performance sometime with partitioning if partitions are widely different, or with bad obsolete statistics, but they work well in my experience, both in performance, management and scalability.

    You can adress indexes at partition level (no online though), you can create filterded indexes based on partition logic, you can replicate based on partition logic, with huge granularity and scalability. You have easy and performant tasks for maintenance, respecting Aligment and implementing Sliding Windows.

    There is plenty of material on this officially from MS and is hugely dealt with on forums, you will definetly find the strategy that suits you best.

    Regards,

         Marco


    Marco Carozzi
  • Sunday, November 27, 2011 8:06 PM
     
     

    By this case , we could beak  our project into half :

     

     

    The first half is relevant to schema partitioning which would be through 120 file groups band each file group has one file at least>>then schema partitioning function could be based on date

     

     

    The second half is relevant to archiving process which could be done better using  Bulk Merge commands of 2008 but take with consider :

     

     

    ·          Using Try /catch commands better to catch precisely any relevant errors and analysis them well.

     

    ·          Batching archiving process into smaller pieces.

     

     

    If more details about above point s , please let me  know


    Think more deeply of performance terms