Database snapshot as ETL source to prevent locking on main production database
-
Tuesday, April 03, 2012 12:51 PMAny idea on this? Does the snapshots work well?
Hi,
I'm planning a system that is estimated to grow to a main production database of 200 Go. There is an ETL process that is going to run at night to load a datawarehouse database on the same server. Resource Governor is enabled and configured so CPU and memory will be attributed to each process. I want to prevent the ETL from having a negative effect on batch and online (if any at night) performance. The system is on a VM with reserved CPU and memory (2 CPU, 26Go memory), the disks are SSD with optical fiber, on a RAID10 array, data and log are on separate arrays on a SAN.
My idea is to use a database snapshot created right before the ETL process and dropped right after. I feel that this solution saves I/O compared with a backup/restore solution, which does as much I/O on the files, and more than twice on the disks (backup + restore + ETL compared to only ETL with the snapshot solution + I/O corresponding to the modified data pages in the source database). According to MSDN, it should also prevent locking on the source database tables.
Did someone try the snapshots in a context similar? Do you think it could have any effect on performance? The I/O subsystem seems to me fast enough. My main concern is locking, but MSDN says that no locks are hold on the source database tables.
Thanks,
Jim
- Edited by Jean-Michel Bouchard Tuesday, April 03, 2012 1:02 PM
All Replies
-
Tuesday, April 03, 2012 2:53 PM
The snapshot does work well. Not only that, it provides a consistent view to a point in time of the source database which can eliminate some complexities from an ETL standpoint.
That been said, there are some limitations and performance considerations to be aware of as indicated here: http://msdn.microsoft.com/en-us/library/ms175158.aspx#LimitationsRequirements
Luis @luisefigueroa
- Marked As Answer by Jean-Michel Bouchard Thursday, April 05, 2012 1:49 PM
-
Thursday, April 05, 2012 1:50 PM
Thanks Luis,
We will start with the snapshot solution. If performance degradation is too bad for the batch cycle will shift to a backup/restore solution. I'm very confident that the snapshot solution will be good enough.
Jean-Michel
Jean-Michel
-
Monday, April 09, 2012 2:21 PM
Hi,
...I want to prevent the ETL from having a negative effect on batch and online (if any at night) performance...
..My idea is to use a database snapshot created right before the ETL process and dropped right after.
Hi,
the thing is that your way, in performance perspective will hurt server performance even more than you just run queries over live data because SQL Server because in this case, SQL Server have to:
1) Support recently created snapshot
2) Run ETL workload
instead of (snapshot free approach):
1) Run ETL workload
Reason: SQL Server snapshots, unfortunately, doesn't add hardware capacity or isolate workload. But, yes, it prevents locking of source tables.
If you really want to isolate ETL workload, consider to use separate box with mirroring (async?), logshipping or mere backup/restore.
So I agree with Luise in this discussion, unfortunately, snapshot is not a panacea
- Edited by Alex Volok Monday, April 09, 2012 2:22 PM
-
Tuesday, April 10, 2012 8:12 PMThanks Alex. You're right about the extra load, but I thought it would be less extra load compared to maintaining a replication publication.
Jean-Michel
-
Wednesday, April 11, 2012 6:52 PM
Thanks Alex. You're right about the extra load, but I thought it would be less extra load compared to maintaining a replication publication.
Jean-Michel, your point is absolutely clear, since you are looking the way to make minimal impact by ETL to production server.
Jean-Michel
And if you deal with with VLDB (100GB+) such routine operations as BACKUP/RESTORE can add significant resource usage, so snapshot can provide desired lock-free solution with no significant impact at non-bussiness time.
But even in such cases such approaches of incorporating standby servers as:
1) Combination of full (weekly ) and diff (daily) backups;
2) Logshipping;
3) Mirroring;
can also reduce "replication workload" heaviness.
So, at least, you have options.

