SQL Server Developer Center >
SQL Server Forums
>
SQL Server Data Warehousing
>
Using Change Tracking For Datawarehousing
Using Change Tracking For Datawarehousing
- Hi all,
We would like to use Change Tracking (Standard Edition available only !) to isolate records that have changed during day time to do nightly loads in our DWH (for example, we would like to get all customer changes done today (e.g. all insert/update/delete done on the customer table on the 1st of october 2009 from 8h00 AM till 18h00 PM).
Since there is no time information recorded with Change Tracking, how can we filter changes done each day on a given table ?
Can someone provide a clear example how to recover those changes ?
Can someone also explained how the columns SYS_CHANGE_VERSION and SYS_CHANGE_CREATION_VERSION are intended to work together ?
Indeed, I remarked some weird cases while testing Change Tracking with transaction:
1) Change Tracking recorded only 1 change for an insert, update and then delete of a record done in the same transaction: DELETE was recorded, INSERT and UPDATE weren't.
2) Change Tracking recorded only 1 change for an insert, update of a record done in the same transaction: INSERT was recorded, UPDATE wasn't.
With case 1) we would end with an error during our DWH load because we will try to delete a nonexistent record (since Change Tracking did not kept the INSERT information, we don't know the record has been created before being deleted).
Thanks.
Answers
- Check this out:
http://www.microsoft.com/uk/msdn/nuggets/nugget/299/SQL-Server-2008-Change-Tracking.aspx
It only checks net changes, if an item goes from A to B to C, you cannot see B as it is an intermediate change, only C. You must turn Change Tracking on at both the database and table level. You can use the change function to return the primary key of all rows that have been updated, inserted, or deleted. You could use SQL to add an EndDate to all fields that have been Deleted depending on your needs.
I actually still tend to create slowly changing dimensions using dates (depending on the type of SCD it is...more information on an alternative for fast changing dimensions here)
And just to be clear (perhaps it is due to my inexperience with Change Tracking), but I view Change Tracking as a supplement to help manage your slowly changing dimensions, not as an alternative. You can use Change Tracking to help find the changes, but then your dimensions need to be updated accordingly (just as they were had you used some other logic to identify the changes).
Check out my blog!- Marked As Answer byJin ChenMSFT, ModeratorTuesday, November 03, 2009 8:24 AM
All Replies
- Would storing a rowversion/timestamp (not the datetime timestamp) column in the customer table, for instance, help? A rowversion column automatically increments itself with each insert/update. All you'd have to do is log the rowversion since the last extract and grab any new records that have a rowversion > than the last rowversion. I've used this approach with great success.
As for processing deletes, the thing I've always done was to expire all active customer records in the DWH first, then when the inserts/updates come along, active customers would get their flags reversed back to active and of course new customers would get inserted.
Phil Brammer | http://www.ssistalk.com - Thanks for the answer.
We thought about a timestamp/binary_checksum column but we discarded this solution since we cannot keep track of DELETE action.
Indeed, this solution works well for dimension (whereas slowly changing dimension need to be handled a bit differently) but not for fact tables.
If a fact is deleted in the production system (don't ask me why... :-D) we have to know it to handle it in the DWH.
We already have a system of tracking changes based on triggers but it adds a lot of overhead on production system.
We thought of Change Tracking in order to track changes more efficiently.
Have you any example of Change Track use in the context of Datawarehousing ?
Thanks. - Check this out:
http://www.microsoft.com/uk/msdn/nuggets/nugget/299/SQL-Server-2008-Change-Tracking.aspx
It only checks net changes, if an item goes from A to B to C, you cannot see B as it is an intermediate change, only C. You must turn Change Tracking on at both the database and table level. You can use the change function to return the primary key of all rows that have been updated, inserted, or deleted. You could use SQL to add an EndDate to all fields that have been Deleted depending on your needs.
I actually still tend to create slowly changing dimensions using dates (depending on the type of SCD it is...more information on an alternative for fast changing dimensions here)
And just to be clear (perhaps it is due to my inexperience with Change Tracking), but I view Change Tracking as a supplement to help manage your slowly changing dimensions, not as an alternative. You can use Change Tracking to help find the changes, but then your dimensions need to be updated accordingly (just as they were had you used some other logic to identify the changes).
Check out my blog!- Marked As Answer byJin ChenMSFT, ModeratorTuesday, November 03, 2009 8:24 AM


