none
Questions about SQL Server Reindexing RRS feed

Answers

  • Mike,

    Glad it helped you!

    1. Does this need to be run every day?

    I don't know. Are there scenarios where is does? Sure, but that's either because the data structure wasn't properly designed - or - that's how the business logic MUST work.

    For example:

    If a random GUID is used as the clustered index of a table, then the page on which the row needs to live will be all over the place. This will cause page splits which is a leading cause of fragmentation. It would also put more load on the transaction log IO to log all of the operations, range scans would be worse (read ahead), and more individual page writes would be needed for things like checkpoint.

    On a small index, this isn't an issue. Assuming the index was 100 rows that took up a total of 50 pages. It's only 400 Kilobytes, a far cry from terrible. At that size the table it'll probably do a table scan over a seek (depending). Even if the average page space utilized was 40% meaning that the index is WASTING 160 kilobytes of both on disk and in memory storage. This translates to 20 extra pages that could be removed (as free spae once the index was rebuilt). If the index was rebuilt, the issue is that a new insert would be in a random location and since the pages were 100% full (or very close) a page split would need to occur. In this case it's probably a bad choice for the clustered index and should have been checked during the design phase.

    Assume the same as above but the index was 134,217,728 pages (this would be 1 GB of space) and it was 40% free space on a page due to page splits. In this example 53,687,091 pages are wasted which equals around 429 Megabytes!

    In this case would it make sense to run this every day? No. What would make sense is to live with the fact that the database was poorly designed and the "fix" IMHO would be to add extra space into the index to account for the external fragmentation (pages aren't physically in order on disk) through the fill factor option and leave a tiny bit of room if a page split does occur at the intermediate level through the pad index option. In this case, we need to observe what is actually happening and make a good decision based on that. In this case we could use the options given to us and only need to rebuild or re-organize every so often based on external fragmentation. Internal fragmentation - which is the free space on the page - would stay high which we artificially made happen. This would cause more IO to be needed to read the pages for queries, but this is the trade off of requiring more IO and more space over having a heavily fragmented index.

    EDIT:

    I had to add this in as I felt the above wasn't very fair of me to say. To be fair, if the applicaiton was mostly READING this table and it had a low amount of change (say 80% read 20% write) it might make sense to rebuild it or reorganize it more often. The reason is because the 429 MB of space would be not only wasted on disk, but in memory too. This means more in memory IO are needed to satisfy a "not so optimized" query. Since the majority of the operations would be READING the table this would cut down overall on IO which would return results faster and allow for more free memory for other objects to be loaded.

    The reverse of this is true as well. If the application was mostly WRITING, page splits would slow the inserts down. In this case we would want to optimize for inserts (writing) and to do that is what I have in the original block of text above the edit line. Since reads would be less, we'd be more worried about the write performance. Having the extra space would hurt meory and disk space but the writes would happen faster as new pages wouldn't need to be allocated, logged, etc.

    EDIT End

    2. Also, the DBA of our customer is under the impression that all other job activity must be suspended while this DB maintenance job runs. Based on your response it sounds like this is unnecessary.

    Activity doesn't "need" to stop, but it will be BLOCKED depending on what is currently being done. This may or may not be acceptable - especially if it is on OFFLINE operation. Online operations have very short periods where blocking could occur.

    So, for example if a clustered index for a very large table takes 32 minutes to erbuild and this rebuild is done offline, nothing that touches that table will be able to process for 32 minutes. This could cause applications to behave not as expected and most will actually time out on the query and return an error.

    -Sean


    Sean Gallardy | Blog | Twitter


    Saturday, December 8, 2012 3:37 AM
    Answerer

All replies

  • The following article can be a good source in this regard. http://www.mssqltips.com/sqlservertip/1852/sql-server-2005-index-best-practices/

    Hima, To help the community better identify relevant thread, please don't forget to mark as answer or vote as helpful if and answer helps.

    Friday, December 7, 2012 7:57 PM
  • Hello Mike,

    Let me explain the items in a little more detail and see if it helps you!

    1. How often is it typically recommended to perform reindexing?

    It depends on the type of maintenance being performed and how frequently it needs done to keep the system healthy. Before choosing to do anything to an index, the fragmentation level should be checked along with the number of pages that make up the index (the size) and what type of index it is (heap, clustered, non-clustered, xml, spatial, etc). This, along with business requirements - such as all maintenance must be done between X and Y hours, etc - will give you an idea of how often maintenance should be checked. If your indexes are becoming heavily fragmented over the course of a few hours, instead of doing maintenance on them (again, depends what type!) looking as to why the fragmentation is at that level in the first place and if there is anythign that can be done about it should be investigated first. If the indexes aren't heavily fragmented after a day end (random time choice) then maybe nothing needs done - this goes back to knowing your databases and users.

    Index size plays a role in this as well. If an index is less than 1,000 pages (8 MB) then it may or may not be worth doing maintenance on it. The reason is because it is so small that the access methods decides to do a scan of it which may be faster than seek for most instances. Again, this all depends - but generally this is the case. For indexes that are very small, say only a handful of pages, these will always seem to be 'fragmented' which is a result of how SQL Server stores pages for small objects (in mixed extents which will make it look fragmented).

    The type of index matters too! If using anything less than enterprise edition then undex operations can't be run "online". This means that if a rebuild of a clustered index would need to happen, users won't be able to access the table for the duration of the maintenance on that index. This could be extremely fast (1 MB table?) or extremely slow. When you do maintenance on the clustered index, this has repercussions on the non-clustered indexes which could put further pressure or strain on the system. Heaps - in order to fix fragmentation - must have a clustered index created on them, then dropped (to re-create the heap) which has the effect of rebuilding all of the non-clusterd indexes as well, twice (once for the creation and once for the drop)! Rebuilds can be online or offline. You'll need enterprise edition or above to do online rebuilds. Offline rebuilds will lock the resource and requests will be blocked until that resource is available. Reorgs are always online.

    The last part is the business requirements. If you can only do maintenance from 2am until 4am every Tuesday - then there isn't much you can do outside of that window.

    2. Is it important to suspend other database access, particularly any updates (e.g., SQL insert statements) while reindexing is being performed?

    See the above where the type of index and size of index matter. If you're going to do an offline rebuild on a clustered index, then requests will be blocked while the index is rebuilt. Eventually user traffic will make it through, but only after the rebuild. This may or may not be acceptable by the business or the end users. In either case, other activities do not need to be suspended or quiecsed.

    3. Other

    There are other implications as well. For one, the recovery model of the database will have a direct impact on speed of the operation along with any HA technologies such as database mirroring or always on availability groups per the way they work. Other items such as replication will be affected, so planning will be needed if using any of those technologies (not just limited to those). Depending on the size of the index and the recovery model of the database, your transaction log could get hit fiarly hard, this might cause an issue with the IO subsystem or you may run out of drive space. These are things to consider, especially if there is a need to rebuild often.

    -Sean


    Sean Gallardy | Blog | Twitter

    • Proposed as answer by Prashant Kumar Monday, December 10, 2012 9:04 AM
    Friday, December 7, 2012 8:32 PM
    Answerer
  • Hi Sean,

    Thanks a lot for your very informative response! Our customer is running a daily job that checks the amount of fragmentation on every table of our database, and does an index defrag unless the table requires an index rebuild based on the amount of fragmentation.

    I'm a bit skeptical that this is necessary to do on a daily basis. I know you don't know anything about our application, but in your experience is it a common practice to run such a job on a daily basis?

    Also, the DBA of our customer is under the impression that all other job activity must be suspended while this DB maintenance job runs. Based on your response it sounds like this is unnecessary.

    Any further comments or insights you could provide would be appreciated.

    Thanks again,

    Mike

    Saturday, December 8, 2012 3:13 AM
  • Mike,

    Glad it helped you!

    1. Does this need to be run every day?

    I don't know. Are there scenarios where is does? Sure, but that's either because the data structure wasn't properly designed - or - that's how the business logic MUST work.

    For example:

    If a random GUID is used as the clustered index of a table, then the page on which the row needs to live will be all over the place. This will cause page splits which is a leading cause of fragmentation. It would also put more load on the transaction log IO to log all of the operations, range scans would be worse (read ahead), and more individual page writes would be needed for things like checkpoint.

    On a small index, this isn't an issue. Assuming the index was 100 rows that took up a total of 50 pages. It's only 400 Kilobytes, a far cry from terrible. At that size the table it'll probably do a table scan over a seek (depending). Even if the average page space utilized was 40% meaning that the index is WASTING 160 kilobytes of both on disk and in memory storage. This translates to 20 extra pages that could be removed (as free spae once the index was rebuilt). If the index was rebuilt, the issue is that a new insert would be in a random location and since the pages were 100% full (or very close) a page split would need to occur. In this case it's probably a bad choice for the clustered index and should have been checked during the design phase.

    Assume the same as above but the index was 134,217,728 pages (this would be 1 GB of space) and it was 40% free space on a page due to page splits. In this example 53,687,091 pages are wasted which equals around 429 Megabytes!

    In this case would it make sense to run this every day? No. What would make sense is to live with the fact that the database was poorly designed and the "fix" IMHO would be to add extra space into the index to account for the external fragmentation (pages aren't physically in order on disk) through the fill factor option and leave a tiny bit of room if a page split does occur at the intermediate level through the pad index option. In this case, we need to observe what is actually happening and make a good decision based on that. In this case we could use the options given to us and only need to rebuild or re-organize every so often based on external fragmentation. Internal fragmentation - which is the free space on the page - would stay high which we artificially made happen. This would cause more IO to be needed to read the pages for queries, but this is the trade off of requiring more IO and more space over having a heavily fragmented index.

    EDIT:

    I had to add this in as I felt the above wasn't very fair of me to say. To be fair, if the applicaiton was mostly READING this table and it had a low amount of change (say 80% read 20% write) it might make sense to rebuild it or reorganize it more often. The reason is because the 429 MB of space would be not only wasted on disk, but in memory too. This means more in memory IO are needed to satisfy a "not so optimized" query. Since the majority of the operations would be READING the table this would cut down overall on IO which would return results faster and allow for more free memory for other objects to be loaded.

    The reverse of this is true as well. If the application was mostly WRITING, page splits would slow the inserts down. In this case we would want to optimize for inserts (writing) and to do that is what I have in the original block of text above the edit line. Since reads would be less, we'd be more worried about the write performance. Having the extra space would hurt meory and disk space but the writes would happen faster as new pages wouldn't need to be allocated, logged, etc.

    EDIT End

    2. Also, the DBA of our customer is under the impression that all other job activity must be suspended while this DB maintenance job runs. Based on your response it sounds like this is unnecessary.

    Activity doesn't "need" to stop, but it will be BLOCKED depending on what is currently being done. This may or may not be acceptable - especially if it is on OFFLINE operation. Online operations have very short periods where blocking could occur.

    So, for example if a clustered index for a very large table takes 32 minutes to erbuild and this rebuild is done offline, nothing that touches that table will be able to process for 32 minutes. This could cause applications to behave not as expected and most will actually time out on the query and return an error.

    -Sean


    Sean Gallardy | Blog | Twitter


    Saturday, December 8, 2012 3:37 AM
    Answerer
  • Thanks Sean!
    Tuesday, December 11, 2012 5:46 PM