I'm just looking to have a bit of a chat about backups on VLDB systems. For instance, at present, the largest database I work with is about 150GB. It has 1 filegroup, "PRIMARY" and one MDF file.
We back this up twice a day and reorganise indexes each night. Each backup takes around 90 minutes to backup, with the index and statistic reorganising taking an additional 45 minutes.
So, my question is this. In a much larger environment, lets say databases of about a Terrabyte in size, what would be the best ways to configure your database, and then to backup/restore? Let's assume that it's a 24/7 organisation just to keep it interesting.
I'm afraid I've got no real idea about where to even start here. Would this be looking at doing filegroup backups, or splitting out more MDF files etc? Or would you just start the backup at 5pm on a Sunday night, let it complete, and then run differential/transaction log backups throughout the week? Or even a full backup once a month, and then differentials/transaction logs? (I'm going to assume that storage isnt a problem at the moment)
I'd love to hear what everyone thinks on this, and also experience they have with this in the real world.
It really depends on what your RPO/RTO is for that specific database. For example you may have a very large ~2TB database that isn't high on the priority list at your company. It also depends on your long term storage or backup solution. Do you have dedicated tape, direct to disk, disk pool to tape, etc.
There are general best practices such as you have pointed on (primary filegroup only system objects, etc), but it really depends on the structure of the database. Personally I work with mostly 3rd party solutions and have no control over the structure of their database or programming. There are a few things I can do, such as contain certain functional parts together, migrate files and filegroups, etc, but there are times when certain best practice solutions just won't work.
When it comes to WHAT should be done, it's best to look at what your objective is and then use a combination of solutions. If the objective is 100% of all data inside of 60 minutes and your backup took 90 minutes, that's not going to meet your SLA. Also take into account the longer time for the restore operation as it goes through crash recovery. In this case, what could be done to speed up the restore? Striped backups could potentially up throughput if disk isn't the bottleneck, if saving to a network share (yikes) maybe look into dedicating a nic to just backup traffic, etc. It's just like tuning a query or a database, do a top down look and see what the bottlenecks are, find solutions, test test test test, change control, check improvement (or lack there of sometimes!).