none
High fragmentation and primary key RRS feed

  • Question

  • Hi, 

    I am new to a project. I noticed that we have many tables with fragmentation +90%, many more have fragmentation 40%+. Looking at the tables most of them have a clustered primary key which is based on "irrelevant" fields. They don't seem to have made an effort to see if the data is coming in some logical order (say chronological) and try to follow the order of the data. I am also trying to get the DBA to confirm if they run any defrag process. There should be something.

    I was thinking that the best solution for the fragmentation is to make the primary key non-clustered, (en mass update). This is also due to the amount of tables.  150+ i need to amend. I will try to check if any indexes can be kept as clustered assuming they are well designed. 

    Do you think changing the index to non clustered for the very high fragmentation tables is a good approach? We have database performance issues as you would expect. 

    Thank you for any answers.


    • Edited by panlondon Wednesday, July 3, 2019 4:16 PM
    Wednesday, July 3, 2019 12:57 PM

All replies

  • ##Looking at the tables most of them have a clustered primary key which is based on "random" fields.

    With the above statement I assume you are referring to GUID column.

    Making GUID as primary key is not the problem but making it as  CLUSTERED is the problem.   


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Wednesday, July 3, 2019 1:43 PM
  • Hi, "random" it means irrelevant fields to an extent. I will edit my post. Thanks for any other replies
    Wednesday, July 3, 2019 4:16 PM

  • I was thinking that the best solution for the fragmentation is to make the primary key non-clustered, (en mass update). This is also due to the amount of tables.  150+ i need to amend. I will try to check if any indexes can be kept as clustered assuming they are well designed. 

    How could changing clustered index to non clustered remove fragmentation ? Also if you remove PK there may be repercussions. I would not do that unless I have clear picture of the schema design

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Wednesday, July 3, 2019 5:06 PM
    Moderator
  • if I have a clustered index with first column start_date ,name, surname, town  that's fine if my data arrives in a chronological order. if now my table has this clustered key: name, surname, town, start_date and I just load my daily update that will quickly cause a lot of fragmentation on a clustered index due to data not arriving in the same order as the key fields.

     

    If it's non clustered then SQL Server doesn't have to keep re-arranging the data physically on the disk it just assigs the pointers so insertions should be much faster. So I think the simple solution is just change the index from clustered to non clustered. No reason to drop it as primary key. Any opinions? Thanks.

    Wednesday, July 3, 2019 5:25 PM
  • As the general rule, I would avoid heap tables (tables without clustered index) outside of ETLs. There are several issues associated with them, most notable - extra IO due to forwarding pointers and suboptimal control of page free space. Tables with clustered indexes would usually outperform them, especially in OLTP. Obviously, there is nothing wrong with having nonclustered PK and separate clustered index.

    With all being said, I rarely see fragmentation as being the main source of performance problems. Especially nowadays with flash-based storage and servers with large amount of memory. Reducing fragmentation may help to slightly improve performance but it would not help much if issues are relying in other areas. I would start with wait statistics - what are the top 5 waits in your system?


    Thank you!

    Dmitri V. Korotkevitch (MVP, MCM, MCPD)

    My blog: http://aboutsqlserver.com

    Wednesday, July 3, 2019 6:15 PM
  • if I have a clustered index with first column start_date ,name, surname, town  that's fine if my data arrives in a chronological order. if now my table has this clustered key: name, surname, town, start_date and I just load my daily update that will quickly cause a lot of fragmentation on a clustered index due to data not arriving in the same order as the key fields.

     

    If it's non clustered then SQL Server doesn't have to keep re-arranging the data physically on the disk it just assigs the pointers so insertions should be much faster. So I think the simple solution is just change the index from clustered to non clustered. No reason to drop it as primary key. Any opinions? Thanks.

    In a heap table, data will be inserted into the random data page. All nonclustered indexes will still need to be maintained - data will be inserted there according to the index key with or without page splits.

    The issue with heap is what happening during updates. If updated row does not fit into the page, SQL Server places it to another data page and change "original" pre-updated row to the small record called "forwarding pointer". Your NCI are not going to be updated and they will reference forwarding pointer instead. As result, you will get extra IO operations during scans or RID lookups as soon as forwarding pointer is encountered.

    Here is a quick demo on it - check IO stats. This is the reason why I do not recommend heap tables in general

    /****************************************************************************/
    /*                       Pro SQL Server Internals                           */
    /*      APress. 2nd Edition. ISBN-13: 978-1484219638 ISBN-10:1484219635     */
    /*                                                                          */
    /*                  Written by Dmitri V. Korotkevitch                       */
    /*                      http://aboutsqlserver.com                           */
    /****************************************************************************/
    /*  Chapter 02. Tables and Indexes: Internal Structure and Access Methods   */
    /*                         Heap Tables and PFS                              */
    /****************************************************************************/ use SQLServerInternals go if exists(select * from sys.tables t join sys.schemas s on t.schema_id = s.schema_id where s.name = 'dbo' and t.name = 'ForwardingPointers') drop table dbo.ForwardingPointers; go create table dbo.ForwardingPointers ( Placeholder char(100) not null constraint DEF_ForwardingPointers_Placeholder default 'Placeholder', IntVal int not null, Data varchar(255) null ) go ;with CTE(IntVal) as ( select 1 union all select IntVal + 1 from CTE where IntVal < 150000 ) insert into dbo.ForwardingPointers(IntVal) select IntVal from CTE option (maxrecursion 0) go select page_count, forwarded_record_count, * from sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.ForwardingPointers'),0,null,'DETAILED') go set statistics io, time on select count(*) from dbo.ForwardingPointers set statistics io, time off go update dbo.ForwardingPointers set Data = REPLICATE('a',255) go select page_count, forwarded_record_count, * from sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.ForwardingPointers'),0,null,'DETAILED') go set statistics io, time on select count(*) from dbo.ForwardingPointers set statistics io, time off go


    Thank you!

    Dmitri V. Korotkevitch (MVP, MCM, MCPD)

    My blog: http://aboutsqlserver.com



    Wednesday, July 3, 2019 6:40 PM

  • Thank you for this Dmitri, i am aware that heap tables is not an ideal situation. As the number of tables is quite high in my case and don't have the time right now to examine every table and amend the indexing. What's the best between these 3 options:

    1. Current situation, don't change indexes, leave the current clustered indexes on "irrelevant" columns, high fragmentation etc.

    2. Make most of the indexes non clustered from clustered. 

    3. Make most of the indexes non clustered from clustered & add an identity key column as clustered.

    I think option 3 will not take much time to do. Do you think the database will be in a better state with Option 3, performing better than option  1. Do you have any OTHER suggestions what to do? I guess the best solution is examine the 150 tables in more detail and create the indexing properly by checking data distribution, update frequency etc but not sure i have the time right now to do it. Maybe at a later stage of the project.

    Thank you.




    • Edited by panlondon Thursday, July 4, 2019 8:30 AM
    Thursday, July 4, 2019 8:27 AM
  • Before you do any of those, I suggest you determine if fragmentation hurt you in the first place (i.e., start with 1). If it doesn't, you are done. Many DBAs defrag "just because". The perf difference between random and sequential I/O is pretty much gone nowadays with modern SANs etc.

    Perhaps it is the update of stats that causes you to see better perf when you defrag? For example

    If you in the end did determine that it is frag that hurt your perf, then you have something to go on...

    Btw, I first blogged that frag doesn't hurt you nowadays, and then did some tests where I found that it can indeed hurt you, but in a different way. 

    http://sqlblog.karaszi.com/does-index-fragmentation-matter/

    http://sqlblog.karaszi.com/index-fragmentation-part-2/


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Thursday, July 4, 2019 1:06 PM
  • I completely agree with Tibor. Invest 10-20 hours into system health check and identify the root-cause of the issue. Fragmentation, likely, is not that - fragmentation may contribute but the problem is elsewhere.

    To put things to perspective. Consider you have internal fragmentation avg_page_space_used_in_percent = 50. It means that 50% of the space is wasted and you need to use 2X data pages to store the data. For example, if you have 8GB of data, you will use 2M data pages and if query is not optimized and performed the scan, it would do 2M reads. If you reduce the fragmentation to 20% - avg_page_space_used_in_percent = 80, the query would do 1.25M reads instead. Yes, it is the improvement, but if you tune the query, you may end up with just a handful of reads regardless of the fragmentation.

    Doing any schema changes may lead to even worse situation - consider the query that is using CI Seek and will regress when index becomes nonclustered and noncovering.


    Thank you!

    Dmitri V. Korotkevitch (MVP, MCM, MCPD)

    My blog: http://aboutsqlserver.com

    Thursday, July 4, 2019 1:35 PM
  • Thank you Tibor, Dmitri. However, something you don't mention much is whether fragmentation hurts data insertions/ updates. You mentioned Selects but how about mass insertions of data in an clustered  index that's not designed very well as in my database design?.

    Or did i miss that section?

    I agree first i need to defrag as some of the tables are 96% fragmented and then see how quickly they get fragmented again and how much difference it makes perfomance wise for the end to end process. I will have to wait for that for next week when we have some proper real data (assuming that the team has fixed some defrag maintenance issue) :) I will post again here with my findings.

    Thank you,

    Panos.

    Thursday, July 4, 2019 1:55 PM
  • No, you didn't miss that section. And, yes, this is absolutely a factor. If you replace the cl ix with an nc ix, you still get the same typ of frag in the nc ix, but the nc ix only have the index keys in it - it isn't the actual data. The actual data (all the columns) can be "undisturbed" from a frag standpoint (possibly) when you are doing your load. So many moving parts here, so best to test your particular situation, just as you plan to do. And, yes, please let us know! :-)

    Tibor Karaszi, SQL Server MVP (Web Blog)

    Friday, July 5, 2019 8:30 AM