locked
Large Scale High Availability Testing RRS feed

  • Question

  • At a high level, we are trying to set up a testing environment that features multiple SQL servers in a High Availability group. The issue in testing is that we want to be able to ensure the integrity of the data, which can scale up to TB’s of data. I would like to be able to revert to a known state, but performing backups and restores are very expensive tasks that can take hours to accomplish. My question is if you have any recommendations for quickly cloning/restoring SQL server in a high availability group.
    Thursday, March 19, 2020 4:56 PM

All replies

  • You might want to look at Delphix. RedGate has SQL Clone as well. These allow you to revert to a previous state quickly, or have multiple versions of the database provisioned simultaneously.

    These tools do not work in an AG.

    Thursday, March 19, 2020 6:14 PM
    Answerer
  • Hello radiospens,

    For quickly backup/restore SQL Server,you can using multiple backup devices allows backups to be written to all devices in parallel and you can custom parameter of restore such as MaxTransferSize and Buffercount, more detail, you can refer to Optimizing SQL Server Backup and Restore & Optimize SQL Server Database Restore Performance

    If your DB is very large and it still can not up to your demands after doing some optimization, then i think you can consider third-party software as Hilary mentioned.

    For restore database in AG, you can restore the database on both primary and secondary replicas first, then add the database in AG which will be faster than syncing database to secondary replicas. 

    Hope it will help. Any further question, please tell us. Thanks.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Friday, March 20, 2020 3:05 AM
  • Thanks for your answer. Having an Availability Group is paramount in our case as many of our customers use them. We need quick solutions to spinning up an HA group to a known state of data.
    Friday, March 20, 2020 3:29 PM
  • Hello friend,

    Then, i think you can consider Redgate or other sql cloning third part software. Hope you can solve it quickly and effective.

    If the reply is helpful, please mark the useful reply as answer. This can be beneficial to other community members finding and reading the thread easily. 
    In addition, if you have another questions, please feel free to ask.
    Thanks for your contribution.

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Monday, March 23, 2020 7:01 AM
  • I think you misunderstood me. If RedGate or other sql cloning options have too many caveats in an environment with availability groups then I’m not interested. For as much as I looked into RedGate and WinDocks, etc., I’m concerned that the hit in performance and lack of support for availability groups will hinder our testing. If you can recommend cloning options for our scenario I’d be interested.

    That said, I looked into those links you provided above and they seemed to help. They reduced the execution time by roughly 33%, which is awesome! However, I would like it to be faster. The DB is ~450GB. I backed it up into 6 different backup files and stored each of them on their own drive. The backup took about a minute. The restore, however, took a little over an hour. I simply used a single server to backup and restore the DB. Can I further split up the work to do? I tried increasing the backup files, but that made it slightly worse. 6 backup files seems to be the sweet spot. Not sure why.
    Monday, April 6, 2020 2:50 PM
  • If you do not have the ability to both backup and restore quickly, then the requirement of high availability is not really met, with the understanding that a restore of the production environment should be a rare occurrence.

    In modern systems (of the last several years, 2014+) it is possible to configure an (on-prem) system capable of backup and restore at 10GB/sec+ rates (20GB/s if you are a fanatic).

    The RedGate tool is good to hide backup times, but there is no substitute for brute force capability in the restore. Caveat here: at 10GB/sec, we can backup 1TB in 100 sec, + some administrative ops. Usually, the backup process can utilize the full IO bandwidth capability of the storage system.

    The restore is more complicated, the file space needs to be allocated, the data transfer portion is more difficult to align perfectly, and hence may occur at a moderately lower rate than the backup.

    The trick is in the log replay. I suggest a check point prior to the backup (this might happen anyways). Avoid any index maintenance ops during the backup. If the (true full) backup is really fast, then the backup file will have little transaction log content that needs to be replayed in the restore, and the restore will also be fast.

    All of the above can be achieved in a custom-configure on-prem system without great expense - if you do not connect to a SAN. For cloud, it will be horribly expensive.

    Note: the configuration involves several PCI-NVMe SSDs, one/two or more evenly sized files on each, backups are also to multiple files for parallel IO. Backup with compression can help depending on your core-count to IO bandwidth capability


    jchang

    Monday, April 6, 2020 3:42 PM