locked
Table Partioning and Query Performance RRS feed

  • Question

  • I was querying a database with an mdf that was 21 GB in size (and growing) yesterday. Almost all of the data is in a one table. A simple selecet query with a WHERE clause on one of the fields in the primary key took almost 3 minutes.

    Would table partitioning improve query perfomance?

    Can an existing database be partitioned with the data in place or would it have to be exprted/imported?

    Thanks to anyone that can help.

    Richard Campbell

    Thursday, May 6, 2010 7:51 PM

Answers

  • I have made some interesting discoveries today. Using the smaller (130 million row) database on my notebook, I reversed the column order on the primary key (to SampleDateTime then TagID). This brought the query exeuction time down from 45 seconds to 15 seconds.

    Even more interesting was that when I ran the query after that, the execution time was zero seconds. This was even after stopping and starting the SQL Server Service to wipe any cached results.

    One subject that hasn't been discussed is Partition Elimination. Could this explain the behaviour that I have described? Does it only become effective after the first query has been executed?

    Monday, May 10, 2010 7:58 AM

All replies

  • yes it will defintely improve the performance.
    Please mark the post as answered to help others to choose the best.
    chandra sekhar pathivada | www.calsql.com (SQL Server Community Website)
    Thursday, May 6, 2010 11:36 PM
  • Well, honestly from the information given it could either help or hurt performance. The question you ask is really too specific.  Partitioning can help in certain  (fairly specific) cases, but it is rarely needed as a first step.  

    When you say "one of the [columns] in the primary key", unless it is the first column of the primary key, the index is likely useless to your query (particularly in a very large table.)  Adding an index that matches the WHERE clause conditions MAY help, but again, still not enough information.

    Posting the table structure and the query/query plan from SSMS would make it easy (or at least easiER) to diagnose your problem because it totally depends on how your WHERE clause is formulated and if it can even use an index.  Including row counts and percentag of duplicate values in the particular column may also be interesting. 


    Louis

    Friday, May 7, 2010 4:20 AM
  • Posting the table structure and the query/query plan from SSMS would make it easy (or at least easiER) to diagnose your problem because it totally depends on how your WHERE clause is formulated and if it can even use an index.  Including row counts and percentag of duplicate values in the particular column may also be interesting. 


    Louis

    Thanks foir the reply.The table creation script and the query are at the end of this post.

    I should explain that this is an industrial data logging application inserting 10 to 20 records per second 24/7. After just under a year's worth of operation, there are about 700 milion rows.

    One problem that I can see is that the SampleDateTime IS the second field in the index. I did try creating an index on SampleDateTime only but not on the problem server. It didn't seem to make much difference but there the query perfomance was OK to start with.

    There is another site, has been running for less time, and has "only" 130 milion rows in this table. The same query runs in 2 secons, obviously dramatically less than 2 minutes 39 seconds.

    The differnces that I can see are:-

    8 CPUs on the faster server verus 4 CPUs on teh slower one

     The /3GB switch set on the faster server but not the other (both have 4GB RAM).

    The faster server has 15,000 RPM SAS drives (RAID 6).

    The fatser server has table partitioning on SampleDateTime (monthly partitions)

    The slower server is SQL Server SE and the faster ist SQL Server EE (to enable partitioning).

    I set up the faster server but not the slower one. What I am trying to decide is whether the dramatic perfomance difference is due to table partitioing, hardware specs or simply the lower numer or rows in the faster database.

    TABLE

    CREATE

     

    TABLE [dbo].[NumericSamples](

    [TagID] [int]

    NOT NULL,

    [SampleDateTime] [bigint]

    NOT NULL,

    [SampleValue] [float]

    NULL,

    [QualityID] [smallint]

    NOT NULL,

     

    CONSTRAINT [NumericSamplesPS_TagIDSampleDateTime_PK] PRIMARY KEY CLUSTERED

    (

    [TagID]

    ASC,

    [SampleDateTime]

    ASC

    )

     

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

     

     

    The BigInt vakue is 15 minutes before the current date/time.
     

    )

    QUERY

    SELECT

     

    TagID, SampleDateTime, SampleValue, QualityID

    FROM

     

    NumericSamples

    WHERE

     

    (NumericSamples.SampleDateTime > 634048659000000000)

     

     

     

     

    Friday, May 7, 2010 11:42 AM
  • From what I see, partitioning would not help a single CPU-tick. You have the correct index for the query already, which is the PK clustered. I would say that you would need throughput or even more throughput. That is, fast disk i/o substem, memory and CPU.

    The reason why I dont believe that partitions would help: You already have the clustered index on the SampleDateTime column which allows a fast seek. So finding the data is not the problem, it's the amount of data in the hitlist, and this can only be helped with resources.

    The /3GB switch is one step towards more resources as it allows SQL Server to alloocate 3 GB buffer Cache (in fact something like 2.7 GB) rather than just 2 GB (1.7). Also are 8 cores more powerful than 4 etc. And of course, in your example, the size of the hitlis plays another major role.

     


    Bodo Michael Danitz - MCT, MCITP - free consultant - performance specialist - www.sql-server.de
    Saturday, May 8, 2010 10:59 AM
  • Thanks, I have the smaller of the databases (the one with the partitions) on my notebook and the same query takes about 50 seconds to run (versus 2 seconds on the server). This shows that hardware can make a big differnce.

    I am still surprised that the other server is so much slower (almost 3 minutes) even tough the table is 6 tomes larger.

    I am still wondering whether partitioning is part of the reason, even though I know that you don't think so. I have very little experience in the area but my expection was that the indexes would be partitioned along with the tables.

    Would it help if the index order was reversed to be SampleDateTime then TagID?

    The was also a suggestionthat creating another indexon just SampleDateTime migt help.

    By the way, here are the table definition and query again is a more readable format.

    Table Definition
    
    CREATE TABLE [dbo].[NumericSamples](
    [TagID] [int] NOT NULL,
    [SampleDateTime] [bigint] NOT NULL,
    [SampleValue] [float] NULL,
    [QualityID] [smallint] NOT NULL,
    CONSTRAINT [NumericSamplesPS_TagIDSampleDateTime_PK] PRIMARY KEY CLUSTERED 
    (
    [TagID]
    ASC,
    [SampleDateTime]
    ASC
    )
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    
    Query
    
    SELECT TagID, SampleDateTime, SampleValue, QualityID
    FROM NumericSamples
    WHERE SampleDateTime > 634048659000000000) 
    
    
    
    
    Sunday, May 9, 2010 2:38 AM
  • Did I get this right, you changed the clustered index to TagID? This would slow down the given query dramatically. The query would benefit from the clustered index being SampleDateTime!
    Bodo Michael Danitz - MCT, MCITP - free consultant - performance specialist - www.sql-server.de
    Sunday, May 9, 2010 7:45 AM
  • Did I get this right, you changed the clustered index to TagID? This would slow down the given query dramatically. The query would benefit from the clustered index being SampleDateTime!


    No that's not the case. The index is is based on TagID + SampleDateTime. I was wondering whether reversing it to be SampleDateTime + TagID would help.

    The other thoight was to create a new index based on SampleDateTime only.

    Sunday, May 9, 2010 8:19 AM
  • For this particular query, the clustered index on SampledateTime would be the fastest, but you can also create the clustered index on (SampleDatetime, TagID). Imporatant is, that SampleDateTime is the first column in the index. If you just created a nonclustered index on SampleDateTime, you would still get a key lookup which is slower than having the clustered index on SampleDateTime.


    Bodo Michael Danitz - MCT, MCITP - free consultant - performance specialist - www.sql-server.de
    Sunday, May 9, 2010 8:26 AM
  • For this particular query, the clustered index on SampledateTime would be the fastest, but you can also create the clustered index on (SampleDatetime, TagID). Imporatant is, that SampleDateTime is the first column in the index. If you just created a nonclustered index on SampleDateTime, you would still get a key lookup which is slower than having the clustered index on SampleDateTime.

    Thanks
    Sunday, May 9, 2010 8:45 AM
  • I have made some interesting discoveries today. Using the smaller (130 million row) database on my notebook, I reversed the column order on the primary key (to SampleDateTime then TagID). This brought the query exeuction time down from 45 seconds to 15 seconds.

    Even more interesting was that when I ran the query after that, the execution time was zero seconds. This was even after stopping and starting the SQL Server Service to wipe any cached results.

    One subject that hasn't been discussed is Partition Elimination. Could this explain the behaviour that I have described? Does it only become effective after the first query has been executed?

    Monday, May 10, 2010 7:58 AM