none
Internals: In a non-clustered index with a non-compact fill factor, where is the insert made?

    Question

  • This is kind of a detailed question, but I am curious because it could affect the selection of fill factors.

    To restate the question, consider a child table with an identity primary key, a clustered index on the primary key, a tinyint column that is a foreign key referencing a lookup table, and an index on the foreign key, ascending. Suppose the lookup table has just 10 records, with PKs from 1 to 10. When an insert is made into the child table, the new record is inserted at the end of the clustered index. If the value of the FK is 5, where exactly is the insert made in the index? I'm visualizing the index as having a group of '1' records, followed by a group of '2' records, and so on. Does the DB engine do the insert into the index at the beginning of the '5' group, at the end, randomly in the middle, or what?

    It would be nice to think that the engine keeps the index records sorted within a group by the appended primary key value, in which case the insert would be at the end of the group. But I can see that a case can be made that it is more efficient to put it at the beginning of the first member of the group that it finds when scanning the index, in which case the index would be sorted ascending by the FK value (by choice) and descending by the PK value (by convention). I can also imagine a situation where the index is scanned in blocks, and the insert is made in the first empty row within the group's blocks, which would make the location of the insert pretty random within the group.

    As I said, kind of a detailed internals question, but I am curious to know how it works.



    • Edited by Jeff Roughgarden Monday, November 11, 2013 7:09 PM Clarification of question
    Thursday, November 07, 2013 5:19 PM

Answers

  • There won't be a synchronous sort - think of what that would do to the database. For a large index it would mean locking or degrading access for some time. Not what you would want as you couldn't control performance - also for single row updates it could mean many consecutive sorts.

    Also - why do you worry about the physical storage? The data is stored in extents anyway and you can't guarantee that these will be consecutive or contiguous - not after the database is used for a bit anyway.

    What you consider sorting is handled by chains - one page is chained to the next. Therefore when there is a page split the chain is amended to include the new page and that takes care of the sort order. When the index is scanned it can follow the page chain and read in the sort order.

    Think your last question misses that the root level of the index is a copy of the row columns, higher levels have start and end values of what you call groups but they are organised as a b-tree (balanced not binary) so will depend on the distribution of values. Columns that are included rather than indexed are not part of the b-tree which can have a remarkable effect on performance.

    If the PK is clustered then it is automatically included in the root level of every NC index - one reason for keeping clustered indexes narrow in large tables.

    You don't really have to worry too much about the physical storage unless the index becomes fragmented in which case you can rebuild (on-line with some versions). You can also leave space in the index for extra entries to be added (fill factor) if you are worried about page splitting - but that only affects the creation or rebuild.

    Friday, November 08, 2013 8:43 AM

All replies

  • This might be out of date or just wrong.

    I think you are just asking what happens to a non-clustered index when a new value is added to the table. Doesn't matter that it's a foreign key.

    The non-clustered index can be considered a copy of the indexed column (plus the clustered index value or a rowid). When a new row is added that is added to the correct page if there is room. If the page is full then it is split into two and chained together and the new entry added to one of them. The b-tree is also updated with possible page splits at each level.

    As I say - might be out of date now.

    Thursday, November 07, 2013 5:27 PM
  • Nigel, thanks for responding so quickly.

    OK, SQL will read the NC index page by page. Using my previous example, suppose it reads a page that contains the end of the '4' group of records and part of the '5' group. If there is any room in the 5 group, the new record will be put there, leading to somewhat random distribution (in terms of the included PK). If there is no room, does SQL do a page split right off, or does it read the next page in the '5' group to search for an empty row?

    In either event, it seems you are saying there will definitely NOT be any kind of secondary sort based on the included primary key, right?

    I suppose that if I did want that sort of behavior, I could always add the PK explicitly to the NC index. However, it seems that would eliminate the value of any fill factor less than 100 because inserts could only be done at the very end of each group of index records.

    A related question on how this all works: Suppose we have a NC index with two keys, say ProductID and ColorID, where the PK is OrderItemID. We have to make an insert for some value pair, say 10 and 5 for some new order item. SQL reads a page and determines that there is no empty space for the new record in the (10,5) group, but there is room above in the (10,4) group. Does SQL compact the (10,4) group to make enough room to insert the new record into the (10,5) group on the same page? If the entire (10,5) group is not on the current page, does it split the page or does it search for an empty slot on the next page?

    Thursday, November 07, 2013 6:13 PM
  • I believe that if an NC index needs to add another entry for some middle value it needs to stuff it onto a page that holds those values so it can cause a page split.

    Otherwise NC index values would be scattered all over any number of pages and would never have good locality, so it can't just take the next entry at the end of the index pages and link it way back to other entries.

    Probably more authoritative and detailed info on this in Kalen Delaney's books or something, but it pretty much has to be this way, doesn't it?

    Josh

    Thursday, November 07, 2013 11:55 PM
  • There won't be a synchronous sort - think of what that would do to the database. For a large index it would mean locking or degrading access for some time. Not what you would want as you couldn't control performance - also for single row updates it could mean many consecutive sorts.

    Also - why do you worry about the physical storage? The data is stored in extents anyway and you can't guarantee that these will be consecutive or contiguous - not after the database is used for a bit anyway.

    What you consider sorting is handled by chains - one page is chained to the next. Therefore when there is a page split the chain is amended to include the new page and that takes care of the sort order. When the index is scanned it can follow the page chain and read in the sort order.

    Think your last question misses that the root level of the index is a copy of the row columns, higher levels have start and end values of what you call groups but they are organised as a b-tree (balanced not binary) so will depend on the distribution of values. Columns that are included rather than indexed are not part of the b-tree which can have a remarkable effect on performance.

    If the PK is clustered then it is automatically included in the root level of every NC index - one reason for keeping clustered indexes narrow in large tables.

    You don't really have to worry too much about the physical storage unless the index becomes fragmented in which case you can rebuild (on-line with some versions). You can also leave space in the index for extra entries to be added (fill factor) if you are worried about page splitting - but that only affects the creation or rebuild.

    Friday, November 08, 2013 8:43 AM
  • Agree about Kalen Delaney (sql server 6.0/6.5 Unleashed :) but most of her other works are excellent too ).
    Friday, November 08, 2013 8:55 AM
  • Hi Nigel, thanks again for your response. I'd like to mark your response as an answer but I don't really think all my questions about the details of inserts into NC indexes have been answered. So by keeping it open, I'm hoping for more discussion.

    You've convinced me that my naive hope for an implicit 'secondary' soft by the appended PK would be silly. It would be way too slow and not the way to design a DBMS.

    I am dealing with some very large tables with very large indexes. There are over 2.2 billion rows and some NC indexes occupy as many as 8 million 8Kb data pages. The reason I'm interested in these details is to understand the difference in how inserts would be accomplished with higher (e.g. 95%) vs lower fill (e.g. 75%) factors. The lead key on several of these indexes is AccountID, and there are 400,000 accounts. So the average number of index pages for each Account is 20.

    In this situation, the details of index inserts matter. Suppose we have to make an insert for AccountID = 5. The DBMS moves down the index until it gets to the first page that contains rows for account 5. If there are empty slots in the 5 region of this page, then it's easy: the first available one will be filled. Suppose there are no empty slots in the 5 region of the first page. There are 3 possibilities:

    1. The DBMS could do a page split on the spot. This would negate the value of having empty spaces via a lower fill factor on the subsequent pages, so I doubt this would occur.

    2. It is conceivable, but unlikely, that the DBMS would try to compact any space in the 4 region above on this first page. It would get the job done, but would be awkward and time-consuming. And if there were no space, we'd be back to either the first or third alternatives.

    3. The DBMS could go on to the next page (second) of Account 5's pages and look for empty space, finally splitting on the last page if there were no empty slots anywhere. This seems to be a solid algorithm that would guarantee that the value of low fill factors would be realized, irrespective of the size of the index. This has to be the way it's done. Do you or others agree?

    Incidentally, it this is true, it changes my thinking on fill factors. I had thought (incorrectly) that the inserts would only go at the end of each account's 20 pages. If so, then the I/O costs of non-compact storage over all pages would exceed the benefits of free space on the last page. So I was thinking that with large multi-page indexes, low fill factors would not be worthwhile. But as I can see now, that was wrong. The I/O costs vs page split cost trade-off inherent in fill factor analysis is independent of index size.

    Friday, November 08, 2013 5:16 PM
  • Hallo Jeff,

    Nigel did perfectly describe the behaviour of the nci. Let me try to dive deeper into the internals of indexing.

    Let's start with the reconstruction of your scenario depending on two tables:

    dbo.tbl_child: master table
    dbo.tbl_lookup: references table

    -- reference table
    CREATE TABLE dbo.tbl_lookup
    (
    	Id	tinyint	NOT NULL	identity (1,1),
    	c1	char(20)	NOT NULL	DEFAULT ('just stuff'),
    
    	CONSTRAINT pk_lookup_Id PRIMARY KEY CLUSTERED (Id)
    );
    GO
    
    CREATE TABLE dbo.tbl_child
    (
    	Id	int		NOT NULL	IDENTITY (1, 1),
    	fk	tinyint	NOT NULL,
    	c1	char(200)	NOT NULl	DEFAULT ('happy stuff'),
    
    	-- Primary key clustered
    	CONSTRAINT pk_child_Id PRIMARY KEY CLUSTERED (Id),
    	CONSTRAINT fk_lookup_Id FOREIGN KEY (fk)
    	REFERENCES dbo.tbl_lookup (Id)
    );
    GO
    
    -- now create an additional non clustered index on the fk-attribute in tbl_child
    CREATE NONCLUSTERED INDEX ix_tbl_child_fk ON dbo.tbl_Child (fk);
    GO
    
    -- what indexes are available?
    SELECT	i.name,
    		i.type_desc
    FROM	sys.indexes i
    WHERE	i.object_id IN (OBJECT_ID('dbo.tbl_lookup', 'U'), OBJECT_ID('dbo.tbl_child', 'U'));
    GO

    The above script will - hopefully - match to your scenario. The script inserts 10 records into the lookup table and 10.000 records into the child table with the following script:

    INSERT INTO dbo.tbl_lookup DEFAULT VALUES;
    GO 10
    
    -- Insert 100 rows with random fk in the dbo.tbl_child table
    DECLARE	@i int = 1;
    WHILE @i <= 10000
    BEGIN
    	INSERT INTO dbo.tbl_child (fk)
    	VALUES (@i % 10 + 1)
    
    	SET @i += 1
    END
    GO
    

    So - after we have 10.000 records in the child relation the first look will be the fragmentation. The fragmentation of the clustered index is nearby 0 because we have contigious values and - as you have mentioned in your question - an insert at the very last page.

    The problem is the index which covers the foreign key because values between 1 and 10 will be inserted randomly. If you look to the results of the page density you will see an app. value of ~50% of the page volume which is in use!

    SELECT i.name,
    		i.type_desc,
    		ps.page_count,
    		ps.record_count,
    		ps.avg_page_space_used_in_percent
    FROM	sys.indexes i CROSS APPLY sys.dm_db_index_physical_stats(db_id(), i.object_id, i.index_id, NULL, 'DETAILED') ps
    WHERE	i.object_id  = OBJECT_ID('dbo.tbl_child', 'U') AND
    		ps.index_level = 0;
    

    I would insert a lot of pics to this post but Microsoft only allows 2 - so I have to save it for further explanations :)

    OK: The nci is in a bad constitution because half of the page volume is unused. You can make visible the physical position of a record by using sys.fn_phylocformatter. The following query runs a SELECT and returns for each record in the nci it's physical position (the page numner may vary on your system)

    SELECT	sys.fn_PhysLocFormatter(%%physloc%%) AS Location,
    		*
    FROM	dbo.tbl_child WITH (INDEX = 2);

    The following pic shows a break where the one index page will switch to the next one because it is full or a page split has occured:

    Concerning the amount of data which can be stored in one index page (concerning the above example) it is as follows: 4 bytes for the row header, 4 bytes for the clustered key (it's an int), 1 bytes for the tinyint = 9 bytes. So there must be space for 8060 / 9 ~ 890 records. As you can see the used slots end at 735. There must have been a page split!

    I rebuild the index and the result will be completly different because now all pages have been new rebuild and orderd!

    ALTER INDEX ALL ON dbo.tbl_child REBUILD;

    All indexes should have a density of ~98% now.

    Now a new record should be inserted into your child table. Before the insert will run we check the current situation where the fk = 5 (I want to find the next value I can insert with 5 :) )

    The pic shows that the nci covers the values for 5 up to the pk value 4154 in my example.

    So the page 3389 is nearby completly full as the following command will show:

    DBCC TRACEON (3604);
    DBCC PAGE ('demodb', 1, 3389, 1);

    The result of the header:

    m_pageId = (1:3389)
    <snip>
    m_prevPage = (1:3388)               m_nextPage = (1:3390)
    <snip>
    m_slotCnt = 736
    m_freeCnt = 0
    

    The excerpt shows the elements I'm focusing on from the header. The most important information is that NO free space is on the page m_freeCnt = 0. So, if a new record will be inserted into the nci a page split will occure as the following code will demonstrate.

    The fist record will be inserted at the very end both indexes (PK and NCI) and the consumed transaction log will be measured. The second INSERT will cause a pages split and again the consumed transaction log will be measured:

    -- no page split in either PK nor NCI
    BEGIN TRANSACTION
    	INSERT INTO dbo.tbl_child (fk) VALUES (10)
    
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id();
    COMMIT TRANSACTION
    
    
    -- page split in NCI
    BEGIN TRANSACTION
    	INSERT INTO dbo.tbl_child (fk) VALUES (5)
    
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id();
    COMMIT TRANSACTION

    If you run the above code the first INSERT will consume app. 600 bytes while the second one will consum 11 times more because a page split occured on the NCI. A look to the transaction log demonstrates the single steps.

    SELECT * FROM sys.fn_dblog(NULL, NULL);

    Move to the very end and search for LOP_PAGE_SPLIT and you will see the transaction which consumes the amount of log space :)

    The reason is a really simple one: The page for the storage of the key value 5 was full up to 100%. The first insert didn't affect the problem because it will be inserted at the very end of the NCI. The second one is problematic because the combination from FK = 5 and PK-Value = ? causes an ordered insertion at the above shown page. Due to the fact that the page is full the records will be moved to a newly created page and a "bad page split" is born!

    I've done a presentation about FILLFACTOR and it's pro and con on the SQLRally in Amsterdam. I hope the scripts and ppt will be available the next time.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Friday, November 08, 2013 6:44 PM
  • Wow, Uwe! That is quite the reply. I'd never even heard of sys.fn_PhysLocFormatter. Thank you.

    In your example, the tables seem to have a 100 percent fill factor. After being rebuilt, the index is very dense. Consequently, when the 5-valued insert is done a page split is required; however, when the 10-valued insert is done, there is no page split since the 10's are on the last page and there was room on the last page.

    In the case of a lower fill factor, do you think I have it right in my preceding reply to Nigel? That is, of the three alternatives I specified for when there is no free space in the first page of a group, it really only makes sense if the DBMS implements the third alternative as the algorithm, right?

    Thanks again.

    Friday, November 08, 2013 7:55 PM
  • Good morning Jeff,

    yes - FILLFACTOR is a pretty good choice for "fine tuning". While the PK of dbo.tbl_Child is contigious there is no need for adjustments - the fillfactor should be 100. For the nci on the [fk] attribute you should (just my opinion) watch the behaviour by monitoring the physical and operational stats of that index.

    "Reset" the index by rebuilding it with Fillfactor = 100

    ALTER INDEX ix_tbl_child_fk ON dbo.tbl_Child REBUILD WITH (FILLFACTOR = 100);

    now watch the fragmentation level depending on the workloads by usage of sys.dm_db_physical_stats

    If you see a high fragmentation you should have a look to the sys.dm_db_index_operational_stats as the demo concering my scenario shows.

    -- Insert 100 rows with random fk in the dbo.tbl_child table
    DECLARE	@i int = 1;
    WHILE @i <= 20000
    BEGIN
    	INSERT INTO dbo.tbl_child (fk)
    	VALUES (@i % CAST(RAND() * 9 + 1 AS int))
    
    	SET @i += 1
    END
    GO

    The above script guarantee a random insert to the FK-attribute. Now I check the data density of the pages in the leaf level of the affected index:

    SELECT	ps.avg_fragment_size_in_pages,
    	ps.avg_page_space_used_in_percent,
    	ps.record_count,
    	ps.page_count
    FROM	sys.indexes i CROSS APPLY sys.dm_db_index_physical_stats(db_id(), i.object_id, i.index_id, NULL, 'DETAILED') ps
    WHERE	i.name = 'ix_tbl_child_fk' AND
    	ps.index_level = 0

    For mey example the density went down from 99,8% to ~70% (for only 20.000 inserts!). So the next option is to know about the workloads for that index. You use sys.dm_db_index_operational_stats for it.

    SELECT	os.leaf_insert_count,
    		os.leaf_update_count,
    		os.leaf_delete_count,
    		os.leaf_ghost_count
    FROM	sys.indexes i CROSS APPLY sys.dm_db_index_operational_stats(db_id(), i.object_id, i.index_id, NULL) os
    WHERE	i.name = 'ix_tbl_child_fk';

    For my example I got a workload which only shows inserts but if you use it in your production system you get "more business usual" values :). Now you have two measures which may give you a good starting point whether you should act or not. But...

    If you don't consider page density (is a bad indicator for the usage of the buffer pool only!) you should either consider the usage of the index. If you will have 99% index seeks you won't boost the performance of your query while more index scans will give a good choice to think about balancing performance vs usage of buffer pool.

    SELECT	us.user_seeks,
    	us.user_scans,
    	us.user_lookups,
    	us.user_updates,
    	us.last_user_seek,
    	us.last_user_scan,
    	us.last_user_lookup
    FROM	sys.indexes i INNER JOIN sys.dm_db_index_usage_stats us
    	ON (
    		i.object_id = us.object_id AND
    		i.index_id = us.index_id
    	)
    WHERE	i.name = 'ix_tbl_child_fk';
    

    Now you have three measures for a good decision baseline for the benefits of FILLFACTOR. If you would take into consideration a FILLFACTOR for the intermediate levels, too (PAD_INDEX = ON) you have to check the fragmentation for the b-tree, too. Remove the "WHERE ps.index_level = 0" to cover intermediate levels, too.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    Saturday, November 09, 2013 5:36 AM
  • Jeff,

    aside from my "theoretical" excerpt above a FILLFACTOR will not work in your case because of the following reason:

    You have an index on FK in tbl_child. This index covers two attributes:

    FK is the index key
    Id is the clustered key

    If you have a FILLFACTOR of whatever value it means that ALL pages on the leaf level will have this value. But WHERE will a new record be inserted?

    Due to the fact that Id is a contigious value it will only be inserted as last value in the index. The first pic shows the index with a FILLFACTOR of 80% covering the range for FK = 5:

    As you can see all pages have been filled by ~80% and the first column represents the FK. The second one is the clustered key. Now think about your workload. If you insert a new record with fk = 5... - what is the next clustered key values? It's 112 so there will be NO chance to have any values for the clustered index which is smaller than the current because it is a contigious increasing value. The next value will be inserted as follows:

    As you can see all previous pages won't be affected any more - it's wasted space for the buffer pool! Only these pages which contains two or more FK-values on a page may benefit from a FILLFACTOR - not pages which have only one index value for FK!


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    Saturday, November 09, 2013 5:53 AM
  • Loved your reply Uwe i think you got an article for November Wiki. :)

    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Saturday, November 09, 2013 6:48 AM
    Moderator
  • Good morning Shanky,

    hahaha - you made my day. For sure, i'm currently writing the article for my blog about this problem. Afterwards I will turn it into a WIKI article :).

    For that very reason a big "Thank you" to Jeff to point to this problematic.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Saturday, November 09, 2013 7:16 AM
  • Good morning Shanky,

    hahaha - you made my day. For sure, i'm currently writing the article for my blog about this problem. Afterwards I will turn it into a WIKI article :).

    For that very reason a big "Thank you" to Jeff to point to this problematic.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Hello Uwe,

    I would like you to (if possible) translate your blogs into English .Unfortunately I dont know German language.At least  please do it for Index related blogposts.

    I would really help others. Index is such a deep topic and your blogs are crystal clear

    Hope to see your blog in English.


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Saturday, November 09, 2013 3:48 PM
    Moderator
  • Hi Uwe,

    This is exactly the original question I tried to ask. To reiterate, with some clarifications:

    I am dealing with some very large tables with very large indexes. There are over 2.2 billion rows and some NC indexes occupy as many as 8 million 8Kb data pages. The reason I'm interested in these details is to understand the difference in how inserts would be accomplished with higher (e.g. 95%) vs lower fill (e.g. 75%) factors. The lead key on several of these indexes is AccountID, and there are 400,000 accounts. So the average number of index pages for each Account is 20.

    In this situation, the details of how NC index inserts are done with fill factors matter. Suppose we have to make an insert for AccountID = 5. The DBMS moves down the index (or through the B-tree) until it gets 'near' the pages for account 5. The first question is whether it goes to the first page that contains 5 data, any page, or the last page. I am guessing that it goes to the first page, which likely has some 4 data as well as some 5 data. If there are empty slots in the 5 region of this page, then it's easy: the first available one will be filled. Suppose there are no empty slots in the 5 region of the first page. There are 3 possibilities:

    1. It is conceivable, but unlikely, that the DBMS would try to compact any space in the 4 region above on this first page. It would get the job done, but would be awkward and time-consuming. And if there were no space, we'd be back to either the second or third alternatives below.

    2. The DBMS could do a page split on the spot. This would negate the value of having empty spaces via a lower fill factor on the subsequent pages, so I doubt this would occur. It's just a bad way to design a DBMS.

    3. The DBMS could go on to the next page (second) of Account 5's pages and look for empty space, finally splitting on the last page if there were no empty slots anywhere. This seems to be a solid algorithm that would guarantee that the value of low fill factors would be realized, irrespective of the size of the index. If DBMS designers are smart and I think they must be, this has to be the way it's done.

    The preceding assumes the algorithm searches for the first page with 5 data. The same logic could apply if we found the last page with 5 data and searched backward for an empty slot, but it is not as aesthetic.

    In your example, you assert that the insert has to go on the last page and point out that, if so, as I had stated earlier, then the value of a lower fill factor for other pages is zero. If it is as you assert, and how I had thought, that the inserts would only go at the end of each account's 20 pages, then the I/O costs of non-compact storage over all pages would exceed the benefits of free space on the last page. So I was thinking that with large multi-page indexes, low fill factors would not be worthwhile. 

    I'd like to think that DBMS engineers would make the I/O vs page split cost trade-off inherent in fill factor analysis is independent of index size. I'd also like to get Kalen Delaney, Paul Randal, or Kimberley Tripp to consider this, but I don't know how. Incidentally, I could not find anything on this topic in Kalen's book on Internals.

    Monday, November 11, 2013 1:25 AM