locked
Changing column in NOT NULL. Why is there so much transaction log activity on compressed tables? RRS feed

  • Question

  • Hi guys,

    the setup is as following: SQL Server 2008R2, recovery model FULL in order to better track what's going on in the transaction log.

    CREATE TABLE dbo.justnumbers(id INT IDENTITY, number FLOAT NULL);
    CREATE UNIQUE CLUSTERED INDEX CI_number ON dbo.justnumbers(id);
    --Fill the table with 500K unique numbers (no nulls).

    Now I want to check how much traffic in the transaction log will produce changing the column 'number' in not null.

    CHECKPOINT;
    ALTER TABLE justnumbers ALTER COLUMN number FLOAT NOT NULL; --1 sec
    SELECT * FROM fn_dblog(NULL, NULL);
    --31 records in the transaction log, touching olny the metadata such as sys.sysrscols.clst

    Nothing needs to be updated in the table, null bitmap is anyway always present for all columns (see "three null bitmap myths" by Paul Randal http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-630-three-null-bitmap-myths/). Everything is fine.

    Now I want to try the same with a page compressed table.

    DROP TABLE dbo.justnumbers;
    CREATE TABLE dbo.justnumbers(id INT IDENTITY, number FLOAT NULL);
    CREATE UNIQUE CLUSTERED INDEX CI_number ON dbo.justnumbers(id);
    --Fill the table with 500K unique numbers (no nulls).
    
    ALTER TABLE dbo.justnumbers REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);
    CHECKPOINT;
    ALTER TABLE justnumbers ALTER COLUMN number FLOAT NOT NULL; --4 sec
    SELECT * FROM fn_dblog(NULL, NULL);
    --530K records in the transaction log
    What the heck is going on there? Thanks for help
    • Edited by Andrej Kuklin Friday, August 16, 2013 9:41 AM -
    • Changed type Sofiya Li Tuesday, August 27, 2013 12:52 AM comments
    • Changed type Andrej Kuklin Tuesday, August 27, 2013 5:29 PM
    Friday, August 16, 2013 9:40 AM

Answers

  • The ALTER COLUMN to NOT NULL is not just a meta-data operation when compression is enabled.  NULL values are stored differently when compression is enabled and SQL server needs to accommodate those (or not).  Internally, SQL Server creates a new column for the NOT NULL column as demonstrated by the script below, requiring each row/page to be modified.

    The change is a meta-data only operation without data compression; the column is physically stored as fixed length along with the NULL bitmap so no physical row changes are needed.  SQL Server just checks to ensure no NULL values exist in that case.

    USE tempdb;
    SET NOCOUNT ON;
    DBCC TRACEON(3604) WITH NO_INFOMSGS; --return DBCC results to client
    GO
    
    IF OBJECT_ID(N'dbo.ExampleTable', 'U') IS NOT NULL
    BEGIN
          DROP TABLE dbo.ExampleTable;
    END;
    CREATE TABLE dbo.ExampleTable(
          ExampleTable int NOT NULL IDENTITY
                CONSTRAINT PK_ExampleTable PRIMARY KEY CLUSTERED
    	,FloatColumn float NULL 
          ) WITH(DATA_COMPRESSION=PAGE); 
    INSERT INTO dbo.ExampleTable (FloatColumn) VALUES (1.0); 
    GO
    
    --dump page after before alter
    DECLARE @dbcc_results TABLE(
          PageFID     int
          ,PagePID int
          ,IAMFID int
          ,IAMPID int
          ,ObjectID int
          ,IndexID int
          ,PartitionNumber int
          ,PartitionID bigint
          ,iam_chain_type varchar(20)
          ,PageType int
          ,IndexLevel int
          ,NextPageFID int
          ,NextPagePID int
          ,PrevPageFID int
          ,PrevPagePID int
          );
    
    DECLARE 
          @DBID int
          ,@ObjectID int
          ,@PagePID int
          ,@PageFID int;
          
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    DBCC PAGE(@DBID, @PageFID, @PagePID, 1) WITH NO_INFOMSGS; 
    GO
    
    ALTER TABLE dbo.ExampleTable
    	ALTER COLUMN FloatColumn float NOT NULL;
    GO
    
    --dump page after alter
    DECLARE @dbcc_results TABLE(
          PageFID     int
          ,PagePID int
          ,IAMFID int
          ,IAMPID int
          ,ObjectID int
          ,IndexID int
          ,PartitionNumber int
          ,PartitionID bigint
          ,iam_chain_type varchar(20)
          ,PageType int
          ,IndexLevel int
          ,NextPageFID int
          ,NextPagePID int
          ,PrevPageFID int
          ,PrevPagePID int
          );
    
    DECLARE 
          @DBID int
          ,@ObjectID int
          ,@PagePID int
          ,@PageFID int;
          
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    DBCC PAGE(@DBID, @PageFID, @PagePID, 1) WITH NO_INFOMSGS; 
    GO
    
    IF OBJECT_ID(N'dbo.ExampleTable', 'U') IS NOT NULL
    BEGIN
          DROP TABLE dbo.ExampleTable;
    END;
    GO


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    • Proposed as answer by Uwe RickenMVP Friday, August 23, 2013 10:58 AM
    • Marked as answer by Andrej Kuklin Tuesday, August 27, 2013 5:29 PM
    Friday, August 23, 2013 10:31 AM
  • @Dan
    DBCC Page helps indeed to shed some light on what is hapenning and you're absolutely right that the ALTER statement is executed as addition of a new column. Thanks for the script.
    I've tried 2 other things: 1) rebuild index after "alter column not null", 2) alternatively declare the column as not null from the very beginning.
    In both cases I check the "Record Memory Dump" section and compare it to the case "compressed column declared as null". I'm doing this in order to check whether rewriting the page (with all associated operations like logging etc.) was actually necessary.
    The results are as following (based on your script):

    0) Compressed column declared as null
    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 20†††††††††††††††††..2.ð?..

    1) Compressed column after change null->not null and rebuilding the index

    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 2f†††††††††††††††††..2.ð?.. 

    2) Compressed column declared as not null

    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 20†††††††††††††††††..2.ð?..

    (2) is identical to (0).

    Conclusions:
    a. The values persisted in compressed case are actually identical independently whether the column is declared as nullable or not
    b. Changing the column from nullable to not nullable and rebuilding the index changes the persisted value representation
    c. Because ALTER COLUMN for compressed case is implemented in SQL Server as adding of a new column AND also the persisted internal representation changes after the ALTER command, it's totally understandable that all data pages need to be touched and rewritten, but based on (a) I think SQL Server does some unnecessary work and rewriting is actually not neccessary even in compressed case.

    @Uwe
    Thanks for the analysis und Grüße zurück :)

    • Marked as answer by Andrej Kuklin Tuesday, August 27, 2013 5:29 PM
    Monday, August 26, 2013 1:50 PM

All replies

  • Hi Andrej,

    As part of the transaction log internals series, the function of fn_dblog is used to return active (or un-truncated) part of transaction log file. It allows us to see the hierarchy and definitive order of transactions during performing  operations with SQL Server , and to aid in working out how operations work under the covers.

    After performing an compression option to define the desired compression type of tables, it will create compression logs for table or index compression. The number of records will increased in transaction log.

    There is a similar issue about measuring the impact of data compression on the transaction log, you can review the following article.

    http://www.sqlskills.com/blogs/joe/a-small-scale-test-measuring-the-impact-of-data-compression-on-the-transaction-log/

    Thanks,
    Sofiya Li

    If you have any feedback on our support, please click  here.


    Sofiya Li
    TechNet Community Support

    Monday, August 19, 2013 6:54 AM
  • Hi Sofiya,

    that was unfortunately not the point of my question. I do the following:

    1. Create table and fill it with data

    2. Apply compression

    3. Clear out the active part of transaction log with CHECKPOINT (you can try the fn_dblog after CHECKPOINT, it will return 2 rows)

    4. Change one fixed length column (float) to not null => actually nothing should be changed in data pages, and the behavior in uncompressed case confirms it, but if the table is compressed, the ALTER COLUMN statement rewrites all data pages which not only produces huge IO activity, but also creates enormous transaction log traffic.

    I'd like to know whether there is something that really needs to be updates in the compressed case or this is a bug.

    Monday, August 19, 2013 7:53 AM
  • Andrej

    Read this article ...perhaps it will answer some of your questions

    http://rusanu.com/2011/07/13/online-non-null-with-values-column-add-in-sql-server-11/


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Monday, August 19, 2013 9:49 AM
    Answerer
  • Hello Uri,

    I'm aware of this new feature in SQL Server 2012. Unfortunately, it doesn't have much to do with my question. But thanks anyway :)

    Monday, August 19, 2013 10:47 AM
  • Uppp
    Wednesday, August 21, 2013 9:44 AM
  • Hallo Andrej,

    do you know the presentation about compression (row and data) from Bradley Ball. You'll find the slide decks here:

    https://docs.google.com/file/d/0Bx5vP5OK-DMqVjUxdE9tbDN0SXM/edit?pli=1

    I know there is a corresponding Video from his presentation on PASS Summit 2012 but I do not find it :(

    BTW: Best regards to Christopher K. ;)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    • Edited by Uwe RickenMVP Wednesday, August 21, 2013 7:23 PM add info
    Wednesday, August 21, 2013 7:20 PM
  • Hallo Andrej,

    I did run a view tests with your example data and - unfortunately fn_dblog() is unsupported - i can only guess what happens "behind the scene".

    NOT NULL is a constraint.
    NULL is by definition:

    - not empty
    - unknown data type

    So my suggestion is that Microsoft SQL Server will run a CHECK on each row and therefore Microsoft SQL Server has to decompress the content of the attribute. It's the same situation with ROW / PAGE compression.

    When I check the entries in RowLog Contents 0 to RowLog Contents 3 (4 and 5 will not be used) AND [Log Record] i get the following results (just an excerpt from one record) 

    Operation:	LOP_MODIFY_COLUMNS
    Contents 0:	0x026281
    Contents 1:	0x036216811DD5C7D441
    Contents 2:	0x19011281
    Contents 3:	0x0201000C0000072CFD1B0000	
    LogRecord:	0x00003E00C1040000710A000006002200B944080000000402A6000000010000005F000000C1040000480A00001B00000100001C000000000101000300000006000300090004000C000000000002628100036216811DD5C7D44173178F190112810201000C0000072CFD1B0000

    The above entry is from fn_dblog() AFTER adding the constraint "NOT NULL". Information from contents 3 will be used to determine in position in [logrecord] AFTER which changes are logged. If you compare contents 3 with logrecord you'll see that contents 3 is at the very end :( That means NO changes have been made - or in other words: The old value is the same as the new one!

    the next code shows the excerpt from the data page itself with a significant information concerning the hexdump:

    000000002343A2F1:   01036216 811dd5c7 d4411dd5 c7d441             ..b...ÕÇÔA.ÕÇÔA

    The above code shows the HexDump of the record 1 in Slot 0 and .. it can't be compared directly to the information from the fn_dblog(). So - but it's just a suggestion - Microsoft SQL Server compares bytes instead of values. So I assume that Microsoft SQL Server write the "decrypted" value again into the table - where it will be encrypted again!

    But as I said - just a suggestion; because when I run the following statements (within a compressed page)

    BEGIN TRANSACTION UpdateRecord
    	UPDATE	dbo.justnumbers
    	SET	numbers = 1394562164
    	WHERE	Id = 1;
    	UPDATE	dbo.justnumbers
    	SET	numbers = numbers
    	WHERE	Id = 2;
    COMMIT TRANSACTION UpdateRecords

    The first statement will update record 1 to its old value by using a hard coded value while the second one will do the same by using the column name. If you filter for the transaction with

    SELECT	[Transaction ID],
    	[Current LSN],
    	Context,
    	Operation,
    	AllocUnitName,
    	[Page ID],
    	[Slot ID],
    	[Lock Information],
    	[RowLog Contents 0],
    	[RowLog Contents 1],
    	[RowLog Contents 2],
    	[RowLog Contents 3],
    	[Log Record]
    FROM	sys.fn_dblog(NULL, NULL)
    WHERE	[Transaction ID] IN (SELECT [Transaction ID] FROM sys.fn_dblog(NULL, NULL) WHERE [Transaction Name] = 'UpdateRecords')
    ORDER BY
    	[Transaction ID],
    	[Current LSN] ASC;
    Hmm - quite curious but unless fn_dblog() is fully documented we have to guess what it really means :)

    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Friday, August 23, 2013 9:24 AM
  • The ALTER COLUMN to NOT NULL is not just a meta-data operation when compression is enabled.  NULL values are stored differently when compression is enabled and SQL server needs to accommodate those (or not).  Internally, SQL Server creates a new column for the NOT NULL column as demonstrated by the script below, requiring each row/page to be modified.

    The change is a meta-data only operation without data compression; the column is physically stored as fixed length along with the NULL bitmap so no physical row changes are needed.  SQL Server just checks to ensure no NULL values exist in that case.

    USE tempdb;
    SET NOCOUNT ON;
    DBCC TRACEON(3604) WITH NO_INFOMSGS; --return DBCC results to client
    GO
    
    IF OBJECT_ID(N'dbo.ExampleTable', 'U') IS NOT NULL
    BEGIN
          DROP TABLE dbo.ExampleTable;
    END;
    CREATE TABLE dbo.ExampleTable(
          ExampleTable int NOT NULL IDENTITY
                CONSTRAINT PK_ExampleTable PRIMARY KEY CLUSTERED
    	,FloatColumn float NULL 
          ) WITH(DATA_COMPRESSION=PAGE); 
    INSERT INTO dbo.ExampleTable (FloatColumn) VALUES (1.0); 
    GO
    
    --dump page after before alter
    DECLARE @dbcc_results TABLE(
          PageFID     int
          ,PagePID int
          ,IAMFID int
          ,IAMPID int
          ,ObjectID int
          ,IndexID int
          ,PartitionNumber int
          ,PartitionID bigint
          ,iam_chain_type varchar(20)
          ,PageType int
          ,IndexLevel int
          ,NextPageFID int
          ,NextPagePID int
          ,PrevPageFID int
          ,PrevPagePID int
          );
    
    DECLARE 
          @DBID int
          ,@ObjectID int
          ,@PagePID int
          ,@PageFID int;
          
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    DBCC PAGE(@DBID, @PageFID, @PagePID, 1) WITH NO_INFOMSGS; 
    GO
    
    ALTER TABLE dbo.ExampleTable
    	ALTER COLUMN FloatColumn float NOT NULL;
    GO
    
    --dump page after alter
    DECLARE @dbcc_results TABLE(
          PageFID     int
          ,PagePID int
          ,IAMFID int
          ,IAMPID int
          ,ObjectID int
          ,IndexID int
          ,PartitionNumber int
          ,PartitionID bigint
          ,iam_chain_type varchar(20)
          ,PageType int
          ,IndexLevel int
          ,NextPageFID int
          ,NextPagePID int
          ,PrevPageFID int
          ,PrevPagePID int
          );
    
    DECLARE 
          @DBID int
          ,@ObjectID int
          ,@PagePID int
          ,@PageFID int;
          
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    --get database and object_id needed for DBCC IND
    SELECT
          @DBID = DB_ID()
          ,@ObjectID = OBJECT_ID(N'dbo.ExampleTable');
          
    INSERT INTO @dbcc_results
          EXEC sp_executesql N'DBCC IND(@DBID, @ObjectID, 1, 1) WITH NO_INFOMSGS;'
                ,N'@DBID int, @ObjectID int'
                ,@DBID
                ,@ObjectID = @ObjectID;
                
    --get file and page id of first (and only) page of this table         
    SELECT
          @PageFID = PageFID
          ,@PagePID = PagePID
    FROM @dbcc_results
    WHERE PageType = 1; --data page
    
    DBCC PAGE(@DBID, @PageFID, @PagePID, 1) WITH NO_INFOMSGS; 
    GO
    
    IF OBJECT_ID(N'dbo.ExampleTable', 'U') IS NOT NULL
    BEGIN
          DROP TABLE dbo.ExampleTable;
    END;
    GO


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    • Proposed as answer by Uwe RickenMVP Friday, August 23, 2013 10:58 AM
    • Marked as answer by Andrej Kuklin Tuesday, August 27, 2013 5:29 PM
    Friday, August 23, 2013 10:31 AM
  • @Dan
    DBCC Page helps indeed to shed some light on what is hapenning and you're absolutely right that the ALTER statement is executed as addition of a new column. Thanks for the script.
    I've tried 2 other things: 1) rebuild index after "alter column not null", 2) alternatively declare the column as not null from the very beginning.
    In both cases I check the "Record Memory Dump" section and compare it to the case "compressed column declared as null". I'm doing this in order to check whether rewriting the page (with all associated operations like logging etc.) was actually necessary.
    The results are as following (based on your script):

    0) Compressed column declared as null
    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 20†††††††††††††††††..2.ð?..

    1) Compressed column after change null->not null and rebuilding the index

    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 2f†††††††††††††††††..2.ð?.. 

    2) Compressed column declared as not null

    Record Memory Dump

    0000000016D2A060:   01023281 f03f0000 20†††††††††††††††††..2.ð?..

    (2) is identical to (0).

    Conclusions:
    a. The values persisted in compressed case are actually identical independently whether the column is declared as nullable or not
    b. Changing the column from nullable to not nullable and rebuilding the index changes the persisted value representation
    c. Because ALTER COLUMN for compressed case is implemented in SQL Server as adding of a new column AND also the persisted internal representation changes after the ALTER command, it's totally understandable that all data pages need to be touched and rewritten, but based on (a) I think SQL Server does some unnecessary work and rewriting is actually not neccessary even in compressed case.

    @Uwe
    Thanks for the analysis und Grüße zurück :)

    • Marked as answer by Andrej Kuklin Tuesday, August 27, 2013 5:29 PM
    Monday, August 26, 2013 1:50 PM