Sunday, May 06, 2012 2:45 PM
We have a mart that we need to track changes to one table that has no audit date (aka change date, update date).
The problem that has been bugging me, having briefed through some books (Kimball), is that we update the entire table once a day, but the update statement is non-selective, in other words every row is updated.
But there may be changes in only 2000-6000 records out of 250,000. There are about 75 columns.
Before I launch into this, I need to know, will change tracking register the change for rows that have no delta, but were included in the scope of the update statement?
FYI we just need to know which rows changed, not the actual data that changed.
Monday, May 07, 2012 5:14 PM
I believe CT is a good option in your situation. Change tracking will return the only the primary key of the rows that have changed (insert, update, delete ) only. Then you can use the primary key to join to the sorce table and retrive the values for some or all the columns for the rows that changed at the source since the last Commit Sequence Number (CSN) used. Please note that the source table must have a primary key in order to use change tracking.
You can see an example of using Change Tracking to update Type I dimensions here: http://blog.extremebi.net/2010/09/super-fast-data-warehouse-dimensions.html
Hope that this helps,
Luis Figueroa @luisefigueroa
- Proposed As Answer by Leonardo Martínez P Monday, May 07, 2012 6:46 PM
Tuesday, May 08, 2012 7:49 AM
The dilemma is that the entire table is updated by one big update statement.
Even column values that didn't change are updated from a source table via SS Integration services.
So again, I pose the nagging question: will change tracking register the change for rows that have no delta, but were included in the scope of the update statement?
Does it act as the comparison of a checksum would?
Assume all rows and all columns of the destination are being updated from the source.
- Edited by wmtell Tuesday, May 08, 2012 7:56 AM
Tuesday, May 08, 2012 3:31 PM
I see, every row in the source table gets updated but you want to log in the audit table only the rows and fields that actually changed.
If you have SQL Server enterprise edition you could use Change Data Capture to do this. you would have to use the sys.fn_all_changes_<capture_instance> function with the "all update old" parameter to obtain a row with the previous value and a row with the current value for every row that changed. rows that did not actually changed are not reported as updated.
To enable CDC at the database level:
To enable CDC for the table in question (default options):
exec sys.sp_cdc_enable_table @source_schema = 'tableschema' @source_name = 'tableName' @role_name = null
select *FROM[cdc].[fn_cdc_get_all_changes_dbo_Product] (@from_lsn,@to_lsn,N'all update old');
As always, research fist potential performance impact and other considerations. for more info on CDC go to : http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
- Marked As Answer by wmtell Wednesday, May 16, 2012 11:52 AM
Wednesday, May 09, 2012 4:42 AM
(actually I don't care which fields are being updated, just the rows)
I read your blog, and it made good sense and made some things more clear.
When you say you're passing records through a slowly changing dimension, and you reference a type 1 dimension as the dimension being updated, am I correct that you are basically staging the updates and computing the audit date, then, having prepared a table with just a targeted data set of updates, you then update the destination table?
Maybe I should use the Checksum transformation and build a table for comparison of the checksum for source and destination, then bulk those records into the final destination.
Though Kimball seems to recommend doing this only as a last resort.
So far, I've created an intermediate table with just the columns that I care about changing.
Does CDC work that well though with 100,000's of rows ?
- Edited by wmtell Wednesday, May 09, 2012 4:46 AM
Wednesday, May 09, 2012 2:33 PM
"When you say you're passing records through a slowly changing dimension, and you reference a type 1 dimension as the dimension being updated, am I correct that you are basically staging the updates and computing the audit date, then, having prepared a table with just a targeted data set of updates, you then update the destination table?"
The slowly changing dimension is an SSIS component that compares and input set or rows to a destination table based on a business key and establishes what rows are existing and non-existing in the destination table. then it inserts or updates rows based on whether you want to keep historical changes or not (type 2 , type 1 respectively)
"Does CDC work that well though with 100,000's of rows ?"
CDC works well with any number of rows. it does the work in the background as transactions are logged to the log file. it harvests the changes into a table where the changes are readily available for querying at any time through a table valued function. the changes are not identified at query time but as transactions get applied. because of this, the number of records on the source table is not as relevant as the number of transactions per second. As always, you should test the impact of enabling this feature prior to deploying to production
With CDC you don't have to create a stage table with checksums in order to compare. You are already obtaining only the changes and those are ready to be applied to the destination table (audit table in this case). The next decision you have to make is do you want to log all instances of a change to the source table or only he fact that it changed. you can do either with the data obtain using CDC.
Change tracking will actually report changes on every row that was in the scope of the update statement even if the row values did not change.
Hope this helps.
- Edited by luisefigueroa Wednesday, May 09, 2012 2:35 PM