locked
Recording column value changes RRS feed

  • Question

  • I've got a fairly straight forward user requirement to implement:

    Treat some columns as 'maintain update history'. IOW, I need to create a record each time a value is changed, recording the user, the old value and the time the value was changed.

    There are a few challenges here:

    1. I need a reference to the table, column and row
    2. The primary key could be a compound key or a key consisting solely of an IDENTITY column
    3. The value types vary widely
    4. Maintaining referential integrity (if the row disappears, we no longer want to know about the history of column value changes for that row)

    I have some ideas, but none are particularly efficient and all require really complex triggers to implement point 4.

    If there is anybody out there with pointers to some clever database design idea to cater for such a requirement I'd be really keen to hear.

    Many thanks in advance,

    Pete

    Wednesday, October 8, 2014 7:39 PM

Answers

  • Are you talking about one table reference?  If you so you can use simple trigger for update/insert BUT since SQL Server 2005 we have an OUTPUT clause pointed by Kalman to capture changes.

    create table itest ( i int identity not null primary key, j int not null unique )
    create table #new ( i int not null, j int not null)

    insert into itest (j)
    output inserted.i, inserted.j into #new
    select o.object_id from sys.objects as o

    select * from #new
    drop table #new, itest;
    go

     
    Now from this example, you can see the integration of OUTPUT clause with existing DML syntax.
     
    Another common scenario is auditing of data in a table using triggers. In this case, the trigger uses information from the inserted and updated tables to add rows into the audit tables. The example below shows code that uses OUTPUT clause in UPDATE and DELETE statements to insert rows into an audit table.
     
    create table t ( i int not null );
    create table t_audit ( old_i int not null, new_i int null );
    insert into t (i) values( 1 );
    insert into t (i) values( 2 );
     
    update t
       set i  = i + 1
    output deleted.i, inserted.i into t_audit
     where i = 1;
     
    delete from t
    output deleted.i, NULL into t_audit
     where i = 2;
     
    select * from t;
    select * from t_audit;
     
    drop table t, t_audit;
    go

    Hope you have now got an understanding of the OUTPUT 
    clause in SQL Server 2005. It is a powerful feature that enables you
     to eliminate use of triggers in some cases or send results to client as 
    part of the data modification operation efficiently.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, October 9, 2014 6:34 AM

All replies

  • Take a look at Change Data Capture.

    Andy Tauber
    Data Architect
    The Vancouver Clinic
    Website | LinkedIn


    • Edited by Andy Tauber Wednesday, October 8, 2014 8:13 PM
    Wednesday, October 8, 2014 8:13 PM
  • Are you talking about one table reference?  If you so you can use simple trigger for update/insert BUT since SQL Server 2005 we have an OUTPUT clause pointed by Kalman to capture changes.

    create table itest ( i int identity not null primary key, j int not null unique )
    create table #new ( i int not null, j int not null)

    insert into itest (j)
    output inserted.i, inserted.j into #new
    select o.object_id from sys.objects as o

    select * from #new
    drop table #new, itest;
    go

     
    Now from this example, you can see the integration of OUTPUT clause with existing DML syntax.
     
    Another common scenario is auditing of data in a table using triggers. In this case, the trigger uses information from the inserted and updated tables to add rows into the audit tables. The example below shows code that uses OUTPUT clause in UPDATE and DELETE statements to insert rows into an audit table.
     
    create table t ( i int not null );
    create table t_audit ( old_i int not null, new_i int null );
    insert into t (i) values( 1 );
    insert into t (i) values( 2 );
     
    update t
       set i  = i + 1
    output deleted.i, inserted.i into t_audit
     where i = 1;
     
    delete from t
    output deleted.i, NULL into t_audit
     where i = 2;
     
    select * from t;
    select * from t_audit;
     
    drop table t, t_audit;
    go

    Hope you have now got an understanding of the OUTPUT 
    clause in SQL Server 2005. It is a powerful feature that enables you
     to eliminate use of triggers in some cases or send results to client as 
    part of the data modification operation efficiently.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, October 9, 2014 6:34 AM
  • Hi Pete,

    Rather than trying to build triggers have you considered using 'Change Data Capture'  CDC?  It is a built in feature of SQL Server. 

    I believe it will meet your requirements.

    Overview MSDN BOL

    http://technet.microsoft.com/en-us/library/bb522489%28v=sql.105%29.aspx

    An Example:

    http://blogs.technet.com/b/sql_server_isv/archive/2010/12/02/change-data-capture-what-is-it-and-how-do-i-use-it.aspx

    Best Practices

    http://technet.microsoft.com/en-us/library/dd266396%28v=sql.100%29.aspx

    Friday, October 10, 2014 7:33 PM
  • But change data capture doesn't actually capture the user who made the change. It is great tech for recording the changes, but only in as much as you simply want to see data change history, not log the history and who did it.

    >>1. I need a reference to the table, column and row
    2. The primary key could be a compound key or a key consisting solely of an IDENTITY column
    3. The value types vary widely
    4. Maintaining referential integrity (if the row disappears, we no longer want to know about the history of column value changes for that row)

    I have some ideas, but none are particularly efficient and all require really complex triggers to implement point 4<<

    I would suggest you consider making one table per table you want to audit, located in a different schema. Then you don't have any of the problems you are concerned about (really the only problem is disk space used!)

    So:

    create table names.fred (fredId int, value varchar(10)
    create table names_audit.fred(fredId int, changesequence int, changeTime datetime2(3), changeOperation char(1), value varchar(10))

    Now you just need a trigger to insert into names_audit.fred the values on an update, and again on delete. Then the row history is there (or you can have your trigger delete the history upon delete, if that meets your requirement better). That is a pretty simple trigger to write, and the cost to performance is pretty light.

    Creating the log tables and triggers should be a fairly trivial task using the metadata in the system catalog, and the number of values in the key is really not a big deal.


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.


    • Edited by Louis DavidsonMVP Friday, October 10, 2014 8:54 PM Follow up poster noted how my suggestion was wrong
    Friday, October 10, 2014 8:20 PM
  • >as long as the insert into the audit tables is still on fast disks, much like the tran log needs to be.)

    Probably just the transaction log file and TempDb need to be on fast disks. The trigger will cause reads and writes to TempDB (that's where the INSERTED/DELETED virtual tables are read from), and writes to the log file.

    When you commit a transaction in SQL Server the log file is written, but the database changes are made only in memory.  Background processes will eventually flush the changes to the data files, but unless checkpoint drives so much IO that it makes your other transactions wait, this happens completely in the background.

    In fact history/audit tables like this are a good candidate to store on a filegroup on even slower disks than the rest of your tables, as they get large and are rarely read. 

    The only time you would get IO on the audit tables is when you have to fetch pages before you write to them.

    eg with

    create table names.fred (fredId int, value varchar(10)
    create table names_audit.fred(fredId int, changesequence int, changeTime datetime2(3), changeOperation char(1), value varchar(10))

    If names_audit.fred has a clustered PK on (fredId,changesequence) then you need to fetch the page containing the latest changes for that fredId to write the next change.

    If names_audit.fred has a clustered PK on (changesequence,fredId) then the correct page will usually be already cached, since it took the last change for _any_ name row.  Of course you pay a price querying for the change history of a particular name.  So tradeoff.

    David


    David http://blogs.msdn.com/b/dbrowne/






    Friday, October 10, 2014 8:25 PM
  • Yeah, I am batting 1000 today on not thinking enough about the technical details. Good catch, hanging my head a little lower now.

    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    Friday, October 10, 2014 8:55 PM