none
Index Framentation

    Question

  • Is it ok to have Non Clustered Index even if it is heavily framented than to not have Non Clustered ? Does the same goes with Clustered Index?
    Wednesday, November 20, 2013 2:52 PM

Answers

  • >Thanks David. How does NC index not being cached comes into play on heavily framented index issue ?

    In many storage configurations, random IO has lower throughput than sequential IO.  Fragmented objects increase random IO, and nonclusterd indexes are often accessed in random fashion through index seeks. 

    Whereas if you don't have non-clustered indexes your table is accessed using scans, which can produce more efficient sequential IOs.  In a sequential scan sequential 512kb chunks of data are read from disk.  In random IO, 64kb or less is read at a time.  So the sequential IOs read 8x more in each IO, and might incur less hard disk head movement.

    Also nonclustered indexes are duplicate data storage and increase the overall size of the table.  If you have 7 non-clustered indexes on a large table, then SQL has a hard time figuring out which of 8 copies of your data to keep in memory.

    >Event if it is not cached shouldn't the cause of fragmentation be same ?

    Yes, but the impact of the fragmentation may be different.

    >Assuming we have non clustered index heavily cached, data size is around 500GBin a single table, can you give example on how NC index will be better to have even if it is heavily fragmented ?

    Yes.  If the NC index is in memory then the random access pattern caused by the fragmentation is not much more expensive than if it were unfragmented.  Essentially fragmentation is pretty painless on cached objects.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Wednesday, November 20, 2013 8:15 PM
  • >Is table scan better than heavily fragmented NC Index or vice versa if I am performing query on large chunk of data?

    It can be better to scan, especially if your fragmented non-clustered are not heavily cached.  It will depend on the data size, memory size, storage config, etc.  So you'll need to test.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Wednesday, November 20, 2013 5:18 PM

All replies

  • Same answer as always for these types of questions: It depends. If using the index even when it is fragmented means that your query is cheaper than the alternative, then it is better to have this index. This would be the typical case. 

    But of course, the question is why you wouldn't want to defragment your index.


    Tibor Karaszi, SQL Server MVP | web | blog

    Wednesday, November 20, 2013 2:55 PM
  • Non clusterd indexes are good if you have huge modifications and table is havily used for data modifications / Write operations especially with characters value.

    Cluster indexes are best bit for numbers with scaning ( reading only ), if the fragmentations is heavy then you can defrag them weekly as a part of the maintanance to keep the fragmentation in control.

    Clustered

    <dir><dir>

    Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

    The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

    </dir></dir>

    Nonclustered

    <dir><dir>

    Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.

    The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.

    You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, 900 bytes and 16 key columns, and execute fully covered, indexed, queries.

    http://technet.microsoft.com/en-us/library/ms190457.aspx

    </dir></dir>

    Regards
    Raju R

    Wednesday, November 20, 2013 2:56 PM
  • Hallo Zenithsql,

    whether an index is useful is depending on it's usage and the access pathes to the index values!

    If it is an index scan a fragmentation causes more I/O than required if not fragmented

    If it is an index seek the fragmentation (from the physical perspective) can be neglected

    BUT - every fragmentation caused a lower density on the data / index pages.
    Due to the fact that data will be send to the buffer pool before they can be read for the client request you may waste a lot of RAM because of low density of the the pages.

    The following scripts may give you an idea what problems you "may" have...

    1. Will your index be used or not

    SELECT	i.type_desc,
    	us.user_seeks,
    	us.user_scans,
    	us.user_lookups,
    	us.user_updates
    FROM	sys.indexes i INNER JOIN sys.dm_db_index_usage_stats us
    	ON (
    		i.object_id = us.object_Id AND
    		i.index_id = us.index_id
    	   )
    WHERE	i.object_id = OBJECT_ID('dbo.yourtable', 'U')
    ORDER BY
    	i.index_id;

    The above script will give the usage statistics about all indexes of a given table. Take a look to the usage of each one. If you have only updates and all other values are 0 it means NO usage and WASTED storage...

    Even more interesting is the usage of the indexes in the buffer pool. Because of high fragmentation the density of the pages is low. The average density of each level of an index (root / b-tree / leaf) can be evaluated with the following script:

    SELECT	i.type_desc,
    		us.index_id,
    		us.index_level,
    		us.avg_fragmentation_in_percent,
    		us.avg_page_space_used_in_percent,
    		us.record_count,
    		us.page_count
    FROM	sys.indexes i INNER JOIN sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('dbo.yourtable', 'U'), NULL, NULL, 'DETAILED') us
    		ON (
    				i.object_id = us.object_id AND
    				i.index_id = us.index_id
    			)
    WHERE	i.object_id = OBJECT_ID('dbo.yourtable', 'U')
    ORDER BY
    		i.index_id;

    Take a look to the [avg_page_space_used_in_percent and you get a feeling of the wasted RAM if the data need to be read into the buffer pool. To see how many RAM is NOT been used run the following query:

    SELECT	t.name										AS	table_name,
    		i.index_id									AS	index_id,
    		i.name										AS	index_name,
    		COUNT_BIG(bd.page_id)						AS	Total_Pages,
    		SUM(CAST(bd.row_count AS bigint))			AS	Total_Rows,
    		SUM(CAST(bd.free_space_in_bytes AS bigint))	AS	Total_Unused_RAM
    FROM	sys.dm_os_buffer_descriptors bd INNER JOIN sys.allocation_units au
    		ON (bd.allocation_unit_id = au.allocation_unit_id) INNER JOIN sys.partitions p
    		ON (au.container_id = p.partition_id) INNER JOIN sys.indexes i
    		ON (p.object_id = i.object_id) INNER JOIN sys.tables t
    		ON (i.object_id = t.object_id)		
    WHERE	bd.database_id = db_id()
    GROUP BY
    		t.name,
    		i.index_id,
    		i.name;

    The above query will give you a good impression about the relation between indexes and buffer pool (RAM)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    Wednesday, November 20, 2013 3:25 PM
  • Hallo Raju,

    that's the problem with "cut and paste" :)

    "Non clusterd indexes are good if you have huge modifications and table is havily used for data modifications / Write operations especially with characters value."

    This is nonsence - a non clustered index is a mess for "write" operations because in this case every index need to be updated, too. The mean goal for indexing is the performance benfit for queries against data! What positive effect will an additional index have for a table if it is not used by the query optimizer?

    Indexes on character attributes will quite often lead to "page splits" because the data - by default - are often "random" data (i.e. Lastname in a Phone Book). A basic "with character-Value" is nonsence, too. Best "benefits will be if the attribute is fixed length attribute. If the text value may be updated and will not move to a different location (i.e. "ricken" will be changed to "rickena") the record will not change it's location on the page because a fixed length attribute will keep the complete space of the definied length!

    "Cluster indexes are best bit for numbers with scaning ( reading only ), if the fragmentations is heavy then you can defrag them weekly as a part of the maintanance to keep the fragmentation in control."

    This is a really bad advice! When an index need to be maintained is not a matter of schedule but a matter of workloads which may affect the indexes. Therefore it is recommended to have a look to the index usage stats, index operational stats and the fragmentation.

    If the fragmentation is "heavy" you should consider a maintenance for the next maintenance slot. Depending on the edition of SQL Server this strategy may vary (only Enterprise Edition can rebuild indexes "online").

    After collecting these components you are able to advice what will be the best intervall for index maintenance jobs.

    BTW: Please stop copy -> paste and post the link to the article.
    This will be better formatted and the pics will be visible :)
    It is a better reading for interested forum users, too!


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Wednesday, November 20, 2013 4:11 PM
  • Thanks alot for the reply. The table is huge with almost billion records. So, to defrag it is taking huge resource. This table is used mainly for analytic purposes where we query mostly on large set of data. We have created Partitions with Clustered and Non Clustered Indeces. Right now the indeces are heavily fragmented and we are having discussion on whether defraging those will impact performancewise for the type of queries we are using. This is for the Non CLustered Indeces.

    Is table scan better than heavily fragmented NC Index or vice versa if I am performing query on large chunk of data?

    • Edited by zenithsql Wednesday, November 20, 2013 5:14 PM
    Wednesday, November 20, 2013 5:10 PM
  • >Is table scan better than heavily fragmented NC Index or vice versa if I am performing query on large chunk of data?

    It can be better to scan, especially if your fragmented non-clustered are not heavily cached.  It will depend on the data size, memory size, storage config, etc.  So you'll need to test.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Wednesday, November 20, 2013 5:18 PM
  • This really depends on:

    • the type of queries
    • the amount of internal/external fragmentation

    So for example if you are only touching a subset of the data:

    • what's the SARG on? - Can the query use the nonclustered index?
    • Is there maybe also an ORDER BY?
    • Does the index help seeking rows/scanning a subset of the index?

    If so, how much less data has to be read if the index is de-fragmented? - How heavily fragmented is it right now, how much space is wasted, how many out-of-order pages are encountered?

    Usually defragmenting makes more sense the more date the queries return.

    If you are on Enterprise Edition, you will also benefit from better Read-Ahead.


    Andreas Wolter | Microsoft Certified Master SQL Server

    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com | www.SarpedonQualityLab.com

    Wednesday, November 20, 2013 5:22 PM
  • Thanks David. How does NC index not being cached comes into play on heavily framented index issue ? Event if it is not cached shouldn't the cause of fragmentation be same ?

    Assuming we have non clustered index heavily cached, data size is around 500GBin a single table, can you give example on how NC index will be better to have even if it is heavily fragmented ? I am just trying to learn here.

    Wednesday, November 20, 2013 8:05 PM
  • Thanks Andreas,

    • the type of queries : Analytic, which involves alot of aggregation
    • the amount of internal/external fragmentation > 60%
    • what's the SARG on? - Can the query use the nonclustered index? : Yes, it would definitely use as this is in where clause.
    • Is there maybe also an ORDER BY? The query wont use Order by
    • Does the index help seeking rows/scanning a subset of the index? Yes it would.

    Is Read-Ahead only on Enterprise version ?

    Wednesday, November 20, 2013 8:09 PM
  • >Thanks David. How does NC index not being cached comes into play on heavily framented index issue ?

    In many storage configurations, random IO has lower throughput than sequential IO.  Fragmented objects increase random IO, and nonclusterd indexes are often accessed in random fashion through index seeks. 

    Whereas if you don't have non-clustered indexes your table is accessed using scans, which can produce more efficient sequential IOs.  In a sequential scan sequential 512kb chunks of data are read from disk.  In random IO, 64kb or less is read at a time.  So the sequential IOs read 8x more in each IO, and might incur less hard disk head movement.

    Also nonclustered indexes are duplicate data storage and increase the overall size of the table.  If you have 7 non-clustered indexes on a large table, then SQL has a hard time figuring out which of 8 copies of your data to keep in memory.

    >Event if it is not cached shouldn't the cause of fragmentation be same ?

    Yes, but the impact of the fragmentation may be different.

    >Assuming we have non clustered index heavily cached, data size is around 500GBin a single table, can you give example on how NC index will be better to have even if it is heavily fragmented ?

    Yes.  If the NC index is in memory then the random access pattern caused by the fragmentation is not much more expensive than if it were unfragmented.  Essentially fragmentation is pretty painless on cached objects.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Wednesday, November 20, 2013 8:15 PM
  • Thanks Andreas,

    • the type of queries : Analytic, which involves alot of aggregation
    • the amount of internal/external fragmentation > 60%
    • what's the SARG on? - Can the query use the nonclustered index? : Yes, it would definitely use as this is in where clause.
    • Is there maybe also an ORDER BY? The query wont use Order by
    • Does the index help seeking rows/scanning a subset of the index? Yes it would.

    Is Read-Ahead only on Enterprise version ?

    Ok, so the fragmentation is a number to take into account

    Considering just the index being on the SARG, the fragmentation might not matter too much though - depending on the "width" of the result-set/search-range. - That's the next "it depends" ;-)
    This is what my last question was about. So if it is used for "scanning" ranges of data, fragmentation again matters.

    But if then the whole index happened to be in memory, as David already said, fragmentation won't really matter. :-D

    It mainly matters when reading the index from disk to memory - and this is where read-ahead comes into play.

    It's there in Standard Edition as well, but in Enterprise Edition the read-ahead chunks are much larger: up to 1024k

    Hope we didn't confuse you more than enlighten and it finally all makes sense to you.. ;-)


    Andreas Wolter | Microsoft Certified Master SQL Server

    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com | www.SarpedonQualityLab.com

    Wednesday, November 20, 2013 10:25 PM