none
Partitioning related tables using Year RRS feed

  • Question

  • We have 4 levels of related tables(which are below) and want to partition by year. Each level of table has primary key. Should i use primary key of each table as a partition key(by getting the range for each year) or should I use Foreign key as a partition key to partition the tables. 

    Which approach is the best one to do it.

    -------- Parent-----------------------------------------------
    CREATE TABLE [dbo].[Parent](
    [FileId] [int] NOT NULL,
    [UploadDate] [datetime] NOT NULL,
    [Name] [varchar](50) NULL,
    PRIMARY KEY CLUSTERED 
    (
    [FileId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

    ---------Child-------------------------------------------------
    CREATE TABLE [dbo].[Child](
    [FileERId] [int] NOT NULL,
    [FileId] [int] NOT NULL,
    PRIMARY KEY CLUSTERED 
    (
    [FileERId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[Child]  WITH CHECK ADD  CONSTRAINT [FK_Child_Parent] FOREIGN KEY([FileId])
    REFERENCES [dbo].[Parent] ([FileId])
    GO

    --ALTER TABLE [dbo].[Child] CHECK CONSTRAINT [FK_Child_Parent]
    --GO

    -----------------GrandChild---------------------------------------

    CREATE TABLE [dbo].[GrandChild](
    [FileRowId] [int] NOT NULL,
    [FileERId] [int] NOT NULL,
    PRIMARY KEY CLUSTERED 
    (
    [FileRowId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GrandChild]  WITH CHECK ADD  CONSTRAINT [FK_GrandChild_Child] FOREIGN KEY([FileERId])
    REFERENCES [dbo].[Child] ([FileERId])
    GO

    --ALTER TABLE [dbo].[GrandChild] CHECK CONSTRAINT [FK_GrandChild_Child]
    --GO

    -----------------GrandGrandChild---------------------------------------

    CREATE TABLE [dbo].[GrandGrandChild](
    [FileRowErrId] [int] NOT NULL,
    [FileRowId] [int] NOT NULL,
    PRIMARY KEY CLUSTERED 
    (
    [FileRowErrId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GrandGrandChild]  WITH CHECK ADD  CONSTRAINT [FK_GrandGrandChild_GrandChild] FOREIGN KEY([FileRowId])
    REFERENCES [dbo].[GrandChild] ([FileRowId])
    GO

    --ALTER TABLE [dbo].[GrandGrandChild] CHECK CONSTRAINT [FK_GrandGrandChild_GrandChild]
    --GO

    your help would be appreciated.

    Saturday, February 15, 2020 12:18 AM

All replies

  • If it was up to me, I would firstly resist the temptation to partition as long as possible, because partitioning adds complexity and can lead to poor performance. (But it can also help to relieve some paint points.)

    When you partition, you should partition by columns you actually use in your queries. If you run a query over the table where the patitioning column is absent, SQL Server needs to look in every partition. I should immediately qualify this: you can have non-clustered indexes which you do not partition, in which case this is not an issue. However, if you have these so-called non-aligned indexes you lose most of the benefits with partitioning.

    For the parent table, how you do you look up things? By Name? By Date? By Id? By all of them depending on the user?

    And the child tables? If you always access them from the parent by joins, it makes sense to partition on the foreign key, assuming that you can be sure that keys are growing monotonically (which is not unreasonable). To partition by year, you would have to add that column to each table, which is not very appealing.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, February 15, 2020 9:55 AM
  • Thanks Erland for your time. Highlighted are my comments. 

    If it was up to me, I would firstly resist the temptation to partition as long as possible, because partitioning adds complexity and can lead to poor performance. (But it can also help to relieve some paint points.)

    When you partition, you should partition by columns you actually use in your queries. If you run a query over the table where the patitioning column is absent, SQL Server needs to look in every partition. I should immediately qualify this: you can have non-clustered indexes which you do not partition, in which case this is not an issue. However, if you have these so-called non-aligned indexes you lose most of the benefits with partitioning.

    Correct i agree that non-aligned keys will not give good performance when foreign key is used for partitioning. 

    For the parent table, how you do you look up things? By Name? By Date? By Id? By all of them depending on the user? 

    Most of the time, users search on UploadDate with others which are not listed in the post, so we have non-clustered index on UploadDate on parent table and FileId is the Clustered index. 

    And the child tables? If you always access them from the parent by joins, it makes sense to partition on the foreign key, assuming that you can be sure that keys are growing monotonically (which is not unreasonable). To partition by year, you would have to add that column to each table, which is not very appealing.

    The reason for our partitioning is to improve the performance and save some disk space. The only table that has lot of data is GrandChild which has 16 million(for 5 years) and growing 3 to 4 million a year. All other tables, Parent, Child, GrandGrandChild are relatively small. I am not sure if that is good idea to partition only that table(Grandchild).

    The tables I posted here are 4 levels and each level has one table. Actually we have 3 tables for each level from Child level. so we would have to partition total of around 14 tables. 

    Using Primary key of each table as partition key will make the aligned index and will improve the performance , but need to get the range for each year and each table 

    Tuesday, February 18, 2020 7:30 PM
  • Your problem seems very complicated. It is so helpful.
    19 hours 51 minutes ago
  • The reason for our partitioning is to improve the performance and save some disk space. The only table that has lot of data is GrandChild which has 16 million(for 5 years) and growing 3 to 4 million a year. All other tables, Parent, Child, GrandGrandChild are relatively small. I am not sure if that is good idea to partition only that table(Grandchild).

    16 million rows? How big is it in megabytes?

    In what way do you expect to gain performance by partitioning? As I indicated, partitioning done wrong can cause performance issues. If you main incentive to partition is performance, you may be barking up the wrong tree.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    17 hours 39 minutes ago
  • Erland,

    The whole database is 60GB

    By keeping two years of data and archiving older than two years to another database, would give better performance instead of 5 years of data because the scan will be done only on 3 - 4M of data than 16 million.

    We are considering to keep each table as a separate table and use the primary key of each table as a partition key to partition the table. Remove all the foreign keys before switch out the data to the staging tables and then reapply the FKs once the data is archived. 

       

    2 hours 13 minutes ago