none
When is storage space allocated?

    Question

  • I'm using SQL Server 2012 with all the latest updates. For a DB I'm using, I altered an existing table with a lot of data in it to add an addition column, type INT, with a default value of -1. That made the storage requirement for the DB on the disk a bit larger, but not much. I then UPDATEd the values in the new column from -1 to other numbers ranging from 0 to over 500,000, and this added several 10s of GB of storage requied for the DB on the disk. I'm baffled by this--wouldn't the full space for an INT column be allocated on the disk when the column is added to the table and not when the value of the data is changed?
    Monday, July 01, 2013 9:54 PM

Answers

  • Hi Ed,

    yes the value may appear because of possible page splits after the update of the relation. If you add new data to a record and it doesn't fit on the data page a page split will occure. This page split ships app. 50% of data to a new 8k page.

    I would suggest to check the index fragmentation of the relation.

    SELECT page_count, record_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent
    FROM sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('yourtable', 'U'), 1, DEFAULT, 'DETAILED')

    If you see a high fragmentation you should rebuild  the index

    ALTER INDEX YourClusteredIndex ON yourtable REBUILD;

    Then check again an you will see that page_count will decrease :)
    Than you can (but I wouldn't do it) shrink your database!


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    • Marked as answer by Ed_White Thursday, July 04, 2013 3:44 AM
    Tuesday, July 02, 2013 2:56 PM

All replies

  • Hello Ed,

    You've found an optimization that is done under the covers when adding columns. If a column is added and it allows for nulls default value that is the same for each row then the change to metadata is made and a size of data operation does not happen. Once you ran the update statement, all of the rows were touched and the size of data operation took place.

    Note that this is not the case when the column added does not allow nulls. Also for future reference, when a column is dropped the space won't be reclaimed until the clustered index is rebuilt.

    -Sean


    Sean Gallardy | Blog | Twitter

    Tuesday, July 02, 2013 3:01 AM
    Answerer
  • Hi Ed,

    what file has increased?
    Has the data increased or the log file?

    I assume (as additional info to Sean) that you mean the LOG-file which has grown.
    If I'm right with my assumption than it is pretty clear what happend.

    The update is a fully logged transactional operation which - depending on the amount of data - may grow extrem.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Tuesday, July 02, 2013 4:53 AM
  • Tuesday, July 02, 2013 5:35 AM
    Answerer
  • Hallo Sean,

    that's a really interesting observation which i have tested immediately :)! You are right with the assumption that the allocation will not take place when defining ANY default. What you "may" fail is the information that Microsoft SQL Server won't allocate pages only if the colum is definied as NULLable. I've run a view tests and found out (really interesting) that the behaviour is always the same (NULL and NULLABLE). Maybe you can reproduce the behaviour as follows:

    -- Let's create a simple relation
    IF OBJECT_ID('dbo.foo', 'U') IS NOT NULL;
    	DROP TABLE dbo.foo;
    	GO
    
    CREATE TABLE dbo.foo
    (
    	Id		int			not null	IDENTITY (1, 1),
    	col1	char(100)	not null	DEFAULT ('buuuh'),
    	col2	char(100)	not null	DEFAULT ('baaah'),
    
    	CONSTRAINT foo_id PRIMARY KEY CLUSTERED (id)
    );
    GO
    
    -- Now add 100000 rows
    SET NOCOUNT ON
    GO
    
    INSERT INTO dbo.foo DEFAULT VALUES;
    GO 10000
    
    SET NOCOUNT OFF

    The next step I add a new attribute to the relation which is NULLable and has a default.

    -- Clear the log
    CHECKPOINT;
    
    ALTER TABLE dbo.foo
    ADD col3 char(200) NULL DEFAULT ('That is a test');

    Now I run two check queries against the clustered index pages and ... - the log! The result has been expected because a DEFAULT to a new column doesn't update the existing data!

    SELECT page_count, record_count
    FROM sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('dbo.foo'), 1, DEFAULT, 'DETAILED')
    
    SELECT	AllocUnitName,
    		Operation,
    		Context
    		[Page ID],
    		[Slot ID],
    		[Lock Information],
    		Description
    		FROM fn_dblog(NULL, NULL)
    ORDER BY [Previous LSN];

    Pretty cool - the first is the result of the index pages for 10.000 records and the second result demonstrates that only metadata have been changed. Now I add a second column which is not NULLable and has a default. The result is surprising :)

    CHECKPOINT;
    ALTER TABLE dbo.foo
    ADD col4 char(200) NOT NULL DEFAULT ('That is a test'

    If you check after the creation of the new column the indexes and the log you'll out that it is exactly the same behaviour allthough NOW the values for col 4 have been predefinied AND it's not nullable...

    I do not post the pic because it is nearby the same result :)

    Now I run an update on the col4 (which is not NULLable and has a default value...

    UPDATE	dbo.foo
    SET		col4 = 'That is not a test'

    In that moment the amout of pages "explode" and the log demonstrates the "action behind the scene"

    Cool - now we have a huge amount of new pages in the clustered index AND 10.000 fully logged row modifications.

    BTW: If you rebuild the clustered index the space will be allocated, too!

    Sean, thank you for pointing to this issue - it's really interesting and I asume that it is a "behaviour" by default to speed up such operations! I will blog about it (but in german :D ) and will try to get deeper into it!


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    • Edited by Uwe RickenMVP Tuesday, July 02, 2013 5:48 AM add. line
    Tuesday, July 02, 2013 5:37 AM
  • http://rusanu.com/2011/07/13/online-non-null-with-values-column-add-in-sql-server-11/

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    Hallo Uri,

    thank you for pointing to the detailed explanation. Now it's much clearer - unfortunately it's new behaviour in SQL 2012 - cool!

    Unfortunately the allocation will occure when the clustered index will be rebuilded. But it's o.k due to avoiding page splits, ...


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    • Edited by Uwe RickenMVP Tuesday, July 02, 2013 5:54 AM add. comment
    Tuesday, July 02, 2013 5:53 AM
  • That was once of my most-wished for features - why update all rows to add a new value to a new column when you can store it once and substitute that value if the column is not included in that row?
    Tuesday, July 02, 2013 6:28 AM
    Answerer
  • To be clear, I added a non-nullable INT column with a default value of -1 called [Row], so when the column was created, it added -1 default to every row. I right-click on the table to get its Properties, and then click Storage. The table it was added to has 269,562,474 rows. Before adding the column, the table size ("Data space" in Properties) was 42.4GB. After adding the column with default -1, but making some other minor changes as well to the table, the table size was 41.9GB, so adding the row didn't add much to the space. The table has data for a few years of dates, and each date has a few hundred thousand rows, so all I did was write a script to sequence the order of the rows from 0 to how ever many rows for each date into the [Row] column. After doing this, the table size grew from 41.9GB to 79.4GB (+37.5GB). So, by changing the value for the column from -1 to a range of values from 0 to say 400,000, the size of the table grew by 37.5GB. According to the Transact-SQL literature, an INT column should take 4 bytes, and it seems to me that -1 or 400,000 should take the same four bytes.

    After doing this, I used SHRINK on the DB, and also tried backing up and restoring the DB.  The table size remained the same, and according to the DB Properties, the LOG is taking only 2MB on the disk. So it’s not the LOG table that took the additional space—the table with the added column and data took the additional space.

    I’ve added columns to this table before and haven’t noticed such a huge increase in table size. I’m wondering if something else is going on.

    Tuesday, July 02, 2013 2:46 PM
  • Hi Ed,

    yes the value may appear because of possible page splits after the update of the relation. If you add new data to a record and it doesn't fit on the data page a page split will occure. This page split ships app. 50% of data to a new 8k page.

    I would suggest to check the index fragmentation of the relation.

    SELECT page_count, record_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent
    FROM sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('yourtable', 'U'), 1, DEFAULT, 'DETAILED')

    If you see a high fragmentation you should rebuild  the index

    ALTER INDEX YourClusteredIndex ON yourtable REBUILD;

    Then check again an you will see that page_count will decrease :)
    Than you can (but I wouldn't do it) shrink your database!


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    • Marked as answer by Ed_White Thursday, July 04, 2013 3:44 AM
    Tuesday, July 02, 2013 2:56 PM
  • Ed,

    Adding the column shouldn't (by using your numbers) ever decrease the space usage. Doing the math for the added column of 4 bytes for the int and ignoring the null bitmap gives me 4 bytes * 269,562,474 ~ 1 GB of data. You've already stated that your numbers show a significant increase is space usage which brings me to data density on the page. My *guess* is that the index was 100% full or close enough to full that when the column was updated, page splits occurred which is why the table is almost 50% bigger. Could you run the following and report back?

    EDIT: Uwe beat me to it. Also, I don't know why I added the row offset into the calculation... too many things going on at once I guess :)

    -Sean


    Sean Gallardy | Blog | Twitter


    Tuesday, July 02, 2013 2:59 PM
    Answerer
  • Hi Ed,

    I've played around with my above example and that's a pretty good demonstration of the behaviour:

    Let's create the demo table...

    IF OBJECT_ID('dbo.foo', 'U') IS NOT NULL
     DROP TABLE dbo.foo;
     GO
    
    CREATE TABLE dbo.foo
    (
     Id int  not null IDENTITY (1, 1),
     col1 char(100) not null DEFAULT ('buuuh'),
     col2 char(100) not null DEFAULT ('baaah'),
    
     CONSTRAINT foo_id PRIMARY KEY CLUSTERED (id)
    );
    GO
    
    -- Now add 1000000 rows
    SET NOCOUNT ON
    GO
    
    INSERT INTO dbo.foo DEFAULT VALUES;
    GO 10000
    
    SET NOCOUNT OFF
    
    -- check the index fragmentation
    SELECT	index_type_desc,
    	index_depth,
    	avg_fragmentation_in_percent,
    	page_count,
    	record_count
    FROM sys.dm_db_index_physical_stats(db_id(), object_id('dbo.foo', 'U'), 1, DEFAULT, 'DETAILED');

    You can see the really pretty good fragmentation level of ~1.5% - pretty fine.

    Now let's add a new attribute, check the index fragmentation and than update the attribute

    ALTER TABLE dbo.foo
    ADD col3 char(200) NULL DEFAULT ('That is a test');
    GO
    
    SELECT	index_type_desc,
    		index_depth,
    		avg_fragmentation_in_percent,
    		page_count,
    		record_count
    FROM	sys.dm_db_index_physical_stats(db_id(), object_id('dbo.foo', 'U'), 1, DEFAULT, 'DETAILED');
    GO
    
    -- Clear the log!!!
    CHECKPOINT;
    GO
    
    UPDATE	dbo.foo
    SET		col3 = 'That is not a test'
    GO
    
    -- How many page splits did we have
    SELECT Operation,
           COUNT_BIG(*)
    FROM   sys.fn_dbLog(NULL, NULL)
    WHERE  Operation = 'LOP_DELETE_SPLIT'
    GROUP BY
           Operation;
    
    SELECT	index_type_desc,
    		index_depth,
    		avg_fragmentation_in_percent,
    		page_count,
    		record_count
    FROM	sys.dm_db_index_physical_stats(db_id(), object_id('dbo.foo', 'U'), 1, DEFAULT, 'DETAILED');
    GO

    You can see the complete process and it's result. See the page splits which have been occured!
    Due to the upates the amount of data didn't fit to the 8k page and 286 page splits occured - I've doubled the amount of allocated space but the amount of data isn't that size.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Tuesday, July 02, 2013 3:26 PM
  • EDIT: Uwe beat me to it. Also, I don't know why I added the row offset into the calculation... too many things going on at once I guess :)
    Hehehehe - but I think you are correct to add 2 Bytes for the Slot, too - for the newly created pages coming from the page splits :)

    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Tuesday, July 02, 2013 4:44 PM
  • I ran the SELECT script above, then rebuilt the index per above and then ran SHRINK database, and it considerablly reduced the size of the DB.  The size of the "Ticks Archive" table shrank from 79.4GB to 50.1GB. Here are the results:

    SELECT page_count, record_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent
    FROM sys.dm_db_index_physical_stats(db_id('TradesDB edpap321 7-1-2013 1010 AM'), OBJECT_ID('Ticks Archive', 'U'), 1, DEFAULT, 'DETAILED')

    BEFORE REBUILD:
    page_count     record_count   avg_fragmentation_in_percent   avg_page_space_used_in_percent
    10158147         269562474     93.47345535                               61.16513467
    70813               10158147       97.38889752                               67.47066963
    553                   70813            100                                               60.25450951
    4                       553                50                                                  65.08524833
    1                       4                    0                                                    1.779095626

    AFTER REBUILD:
    page_count      record_count   avg_fragmentation_in_percent     avg_page_space_used_in_percent
    6412188           269562474      99.9999844046993                      98.698109710897
    30287               6412188          100                                               99.5772918211021
    144                  30287               100                                               98.973696565357
    1                      144                   0                                                   67.6921176179886

    Tuesday, July 02, 2013 6:44 PM
  • Ed,

    There is your root cause.

    Notice that average page space used is only ~60%. That tells me that a bunch of page splits happened and you have internal fragmentation. The fix is as you've done, do maintenance on the index. If this is something that would frequently happen, a setting other than 0 (which is 100% full) for the fill factor would be a good thing to set around 70 or 80%. If this was just a one time occurrence then just keep it in mind if it would need to be done again.

    -Sean


    Sean Gallardy | Blog | Twitter

    Tuesday, July 02, 2013 6:47 PM
    Answerer
  • I see the OP  is using SS2012

    BTW Congrats for being now a MVP.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Wednesday, July 03, 2013 6:44 AM
    Answerer
  • Hi Erland,

    thank you for the good wishes concerning MVP :)
    I'm glad to be part of a great community and I hope to meet you in Stockholm on SQLRally (we'll see)

    BTT:
    Thank you for the fantastic link concerning the behaviour of DEFAULTS in SQL 2012. I've made a few more tests and one important issue need to be considered:

    The decribed behaviour is not valid if the default will change for each row:

    - newid()
    - newsequentialid()

    That's for information only :)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Wednesday, July 03, 2013 7:14 PM