locked
Multiple Log Shipping Jobs RRS feed

  • Question

  • I'm using log shipping to provide a DR copy of approx. 150 databases (actual databases change every quarter).  While mind numbing, I can go through the manual process of configuring the log shipping for each.  (I know, I can use the option to script this but I haven't succeeded in getting this to work.)  What I want to be able to do is to have the 150 backup jobs execute in sequence (not in parallel) every three hours; the 150 copy jobs run in sequence every three hours (starting 10 minutes after the backup sequence starts); and the 150 restore jobs in sequence.

    I've tried two methods.  The first was to create one big job for each operation (b/c/r) and copy the OS command from each individual job into the big job.  This was especially painful because the log shipping commands use a GUID so it's not apparant what you're working with.  I also found that deleted the job created by the log shipping configuration made SSMS upset when I went back in to disable log shipping for that database.  This looked as though it worked, but it did make tracking stuff very difficult.

    The second method was to similar (one big job for each operation) but I simply used sp_start_job to execute the step that SSMS configured.  During the setup of the log shipping, I enabled the job but disabled the schedule so that the two schedules wouldn't conflict.  After looking at this, I realized that the sp_start_job was running asynchronously so it really isn't spreading the processing burden over time which is my goal.

    Hopefully, this long winded question makes sense and someone can help.  I'm sure that this is a fairly common issue.

    Friday, August 20, 2010 9:17 PM

Answers

  • SSMS does a RESTORE DATABASE WITH MOVE ..., MOVE ..., STANDBY = N'...TUF', NOUNLOAD, REPLACE.  Everything about this makes sense EXCEPT for the filename used with the STANDBY option.  I ran a trace on both the primary and secondary systems.  I can see the full backup and this restore, but I don't see where the transaction file (which includes a GUID) is generated.  Does the system generate this automagically by the fact that it is included in the STANDBY clause?
    • Marked as answer by Dan Brusco Wednesday, August 25, 2010 10:36 PM
    Wednesday, August 25, 2010 9:22 PM

All replies

  • Here is the point though: can you actually failover 150 databases in case of disaster?

    Not suggesting that you try it, but at least you should have a credible run script for such an undertaking.

    What if you upgrade to cluster and just do nightly full backups?

    Where is the standby server in relation to the production server?


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, August 20, 2010 11:59 PM
  • Yes, you can failover 150 databases.  Why not?  And I would absolutely suggest trying it.  If you don't, how do you know it works?  last thing you want to find out when you actually need it. 

    I've attempted the same task as you are, Dan.  It comes down to timing the schedules since the variables involved are changing so often.  This is even more a problem with it comes to maintenance like Index/Statistics operations and large imports causing one backup to run long.  overlapping backups are something to try to avoid but they are going to happen with a database server housing that many and enlisting all of them in log shipping.

    Your best bet I believe may be to move some of them to other database servers if that is a possibility and the DR plans themselves are causing a problem. 


    Ted Krueger Blog on lessthandot.com @onpnt on twitter
    Saturday, August 21, 2010 3:41 AM
  • Yes, you can failover 150 databases.  Why not?  And I would absolutely suggest trying it. 

    Too many possibilities for failure. I would prepare a well-thought out detailed plan before launching such an enterprise.
    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Saturday, August 21, 2010 6:18 AM
  • Yes, you can failover 150 databases.  Why not?  And I would absolutely suggest trying it. 

    Too many possibilities for failure. I would prepare a well-thought out detailed plan before launching such an enterprise.
    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

     

    I guess I'm a little confused on how you came to that conclusion when the OP never said things are not planned out well and DR was not tested.  The only question here is how to manage 150 log backup jobs that are part of Log Shipping.  This is a valid concern and part in a typical DR strategy that uses Log Shipping.  It's way off topic and basically pokes at the concept which is a sound and widely used and successful strategy.

    For the actual question; if there are large backup waits, IO or network issues due to overlapping backups, offloading the DBs to other servers is a sound solution.  Scheduling can be a pain otherwise but can be done.  Without knowing the hardware, the backup threads is hard to estimate to say if it would even be a problem if you ran as much as half of those at once.  I have one production server that houses around half that number of DBs in Log Shipping and the backups have never been an issue with several running overlapped times.  This is typically not a problem because a thread is created for each device and they run in parallel.  That is also another tuning area you can look at. 

    Take a look at http://msdn.microsoft.com/en-us/library/ms190954.aspx for more on managing backups by device to optimize the parallel execution.

    Check large BACKUPIO waits.  Even at that, the problem is network latency.  If you are seeing large IO issues, you are better off helping the situation by tuning your disk. 

     


    Ted Krueger Blog on lessthandot.com @onpnt on twitter
    Saturday, August 21, 2010 2:49 PM
  •  I have one production server that houses around half that number of DBs in Log Shipping and the backups have never been an issue with several running overlapped times. 

    Very impressive Ted! As a DBA I prefer a single database, but that is not always possible.

    Have you done a disaster failover test? How long did it take for all the db-s and apps come online?  Thanks.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Saturday, August 21, 2010 5:11 PM
  • Thanks Kalman,  Yes. I test HA once a month and DR bi-monthly given availability of downtime on some systems. All databases failover fine.  Some DBs are sync mirroring that require next to real-time failover.  The rest are Log Shipped and I use SSIS to initiate my DR site so things like Agent startup, DBs come out of standby, security (orphaned checks), some nasty ODBC reg checks etc... is all accomplished in one job execution. 

    Everything is replicated nightly to the DR site with SSIS.  Jobs, security, DBA level changes and any file system needs.  I also have a service I wrote that grabs differentials nightly and weekly fulls and copies them to the DR site for a backup recovery plan.  this way, even if I lose everything as far as the LS standby databases go, I can restore to the last 5, 15 or whatever the business decision allowed for data loss on particular databases.

    We test in stages also.  Take down servers in stages and take down entire facilities.  Once everything is online on the other side, Log Shipping roles reverse so failing back is simplified.  The large databases take time to push back for restores but in the event of a true disaster, that time is acceptable.  In tests, we have secondary servers that act as testing areas. 

    No DR solution is stable unless you pull the plug on the switch in your primary sites and make sure it works :-)

    I always (always!) tell everyone to test both HA and DR.  It's useless if you do not because you just guess at it working.    I follow these guidelines as far as what the disaster requires. 

    http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/beginning-stages-of-a-dr-plan-for-sql-se

     


    Ted Krueger Blog on lessthandot.com @onpnt on twitter
    • Proposed as answer by Kalman Toth Saturday, August 21, 2010 6:47 PM
    Saturday, August 21, 2010 6:26 PM
  • Meant async (high performance) on that mirroring one.  Not safety on.  performance would be horrid :-)

    That entire SSIS initiation varies depending on the packages I tell it to run for what DBs to come online but I can go full DR in around 15-20 minutes


    Ted Krueger Blog on lessthandot.com @onpnt on twitter

    Saturday, August 21, 2010 6:27 PM
  • Each database is relatively small (5-10MB).  I have the two servers in separate co-location facilities about 50 miles apart.  It would have been better to locate them in different states, but this was what eneded up happening.

    My concern is having 150 backups hit at the same time, followed by 150 copies, then 150 restores.  These are important databases but not business critical.  I want to get to a 3 hour cycle.  I manually set this up last year using replication and scheduled each step 3 minutes apart.  It worked, but was a royal PIA to get setup.  Setting up the log shipping currently isn't any easier, but I'm hoping to eventually write a program to take care of it all for me once I figure out the steps that need to be handled.

    Yes -- I'm well aware that any strategy like this isn't valid until a plug has been pulled.  We're not there yet (should have been there a year ago), but we're working on it.  Having the data in two separate locations is a start I'm trying to get us standardized on.

    BTW, I would not mind being told that I'm overthinking this.  I have been guilty of that in the past.  I suspect that simply setting everyting to run at once (backup at 12:05 x 3hours, copy at 12:20 x 3 hours, restore at 12:45 x3 hours) would not be the best plan.

    Monday, August 23, 2010 3:33 PM
  •  I suspect that simply setting everyting to run at once (backup at 12:05 x 3hours, copy at 12:20 x 3 hours, restore at 12:45 x3 hours) would not be the best plan.


    That could be fatal to the well-being of the server. Imagine the disk read/write heads going bezerk...Spread them out 1 minute apart. Let us know if works.

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Monday, August 23, 2010 3:58 PM
  • I have a "brute force" method in place for a portion of the databases and it seems to be working.  I manually enabled log shipping for each but disabled the schedule for each of the jobs.  I then updated each job to add a second step (the first being created by the transaction log setup) which used sp_start_Job to kick off the next job in the sequence. This appears to accomplish my goal of spreading the workload over time.

    The next step is to automate everything.  How can I get the job id value for each job in the sql server agent queue?  It looks like I can dynamically generate the SQL which will add the needed steps and modify the existing steps (the success/failure paths have to be changed). 

    There has to be a flag somewhere I can test to determine if log shipping is enable for a database.  I'd like to use this to identify any databases which should be log shipped and are not currently; enable log shipping through T-SQL; backup/restore the database to the secondary server; and make these changes.  Any chance anyone has scripted this already?

    Monday, August 23, 2010 7:39 PM
  • step is to automate everything.  How can I get the job id value for each job in the sql server agent queue?  It looks like I can dynamically generate the SQL which will add the needed steps and modify the existing steps (the success/failure paths have to be changed). 


    Check out the following page on programmatic job control. Let us know if helpful.

    http://www.sqlusa.com/bestpractices2005/startjobfromstoredproc/


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Monday, August 23, 2010 8:30 PM
  • The page referenced shows me how to manipulate a job if I already know the unique Job ID.  What I would like to get to is a script (or program) which scans the list of databases matching a certain naming pattern and use that information to test if they are a primary in a log shipping relationship.  If not, the script would then configure the log shipping including the backup/restore and insert the backup/copy/restore jobs into the appropriate chain of jobs.
    Tuesday, August 24, 2010 12:28 PM
  • Are you looking for basic job information? That is msdb.dbo.sysjobs .

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Tuesday, August 24, 2010 1:35 PM
  • I'm almost there.  I have a program setup which configures everything correctly with the exception of the restored database on the secondary server.  If I use RESTORE DATABASE without specifying NORECOVERY, the database is restored fine, but the log shipping restore job fails.  If I use RESTORE DATABASE with NORECOVERY, nothing seems to be setup except a database placeholder (no files).  What I have been doing when configuring this by hand is choosing the Standby mode.  If anyone knows the RESTORE parameters that SSMS uses behind the covers, I'll have this portion of my battle solved.
    Wednesday, August 25, 2010 4:40 PM
  •   If anyone knows the RESTORE parameters that SSMS uses behind the covers, I'll have this portion of my battle solved.

    You can "spy" on SSMS with SQL Server Profiler. Let us know what you find.
    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Wednesday, August 25, 2010 4:58 PM
  • SSMS does a RESTORE DATABASE WITH MOVE ..., MOVE ..., STANDBY = N'...TUF', NOUNLOAD, REPLACE.  Everything about this makes sense EXCEPT for the filename used with the STANDBY option.  I ran a trace on both the primary and secondary systems.  I can see the full backup and this restore, but I don't see where the transaction file (which includes a GUID) is generated.  Does the system generate this automagically by the fact that it is included in the STANDBY clause?
    • Marked as answer by Dan Brusco Wednesday, August 25, 2010 10:36 PM
    Wednesday, August 25, 2010 9:22 PM