vendredi 17 août 2012 10:08
I am experiencing a problem at present whereby database restore is failing, quite often only a few minutes into the restore. No failure reason is given, and the query that calls the restore just completes.In the object explorer, the database shows as "restoring" but sys.dm_exec_requests shows no active restore.
the topology of the process is as follows;-
The backup file resides on Server A, a virtual server which runs Attachmate.
Server B triggers the database restore via a SSIS package, which runs an SQL query to start the restore. The extract is being restored to linked Server C.
The database back up is approxmately 15Gb and this restores to an unpacked 70GB on Server C.
I have noted that there seems to be a data transfer limitation on Server A of about 3Mb/Sec, and I have asked our tech team to look into this.
Can anyone suggest what might be happening and how can trace why the extract restore fails but gives no error? My conjecture at the moment is that the data connection is dropping between Server A and Server C, and the restore thinks it is finished, but cannot complete as key information is missing.
Any help gratefully appreciated.
Toutes les réponses
vendredi 17 août 2012 11:15How do you know it fails if there is no error. does you package hang up? Are you running this package as a job? if yes then you can look up the job history and see if there is something there. A little more description about the behavior the package when it misbehaves would be appreciated.
Please mark the post as answered if it answers your question
vendredi 17 août 2012 17:35
As you mention that package completes without any issue, I am suspacting that you are trying to restore it with norecovery option. Try to run below query for fix.
If this not work, Check SQL server error log for any error message relate to databse you are trying to restore.
Please click the Mark as Answer or Vote As Helpful if a post solves your problem or is helpful!
vendredi 17 août 2012 18:10Modérateur
There could be an error on the other server. Inspect the logs, you must find an error there, I guess in the package you gave no logging enabled.
And also this: would the command succeed outside the package?
Arthur My Blog
lundi 20 août 2012 08:43
Well, technically the package hasn't failed. It just never completes. The database is left in restoring mode. This means that the query that then changes permissions back to multi and sets up user security cannot execute.
However if I check the exec requests, there is no longer a restore going on.
If i look at the server logs on the server where the restore is taking place, the only entry is "setting database option SINGLE_USER to ON for Database", and there are no further entries for that database.
On the server runing the package there are no entries relating to the package.
I have enabled logging in the package, but only error logging, so perhaps the next step is full logging of the restore part of the package.
The other theroy that we have been working on, is that there is a secondary leviathan sized package (runs daily for 14 hours) which occasionally overlaps into the period that the restore is taking place. Can a package trying to query a database which is in restore mode cause that restore to hang?
lundi 20 août 2012 14:21ModérateurMy thinking the SINGLE USER mode does not succeed. Running trans perhaps is the reason. The package times out then, so yes, could be that other package, but I still would enable the full logging and try running at a different (calm) time.
Arthur My Blog
mardi 28 août 2012 12:19
So far I have adjusted the SQL to run the restore withrecovery, and also moved the SSMS package around so that the restores are now sequential rather than parallel. So far there has not been another hang.
I think the SINGLE USER mode does succeed as the restore will often drop 10 or 20 minutes in, even when the SSMS is being run manually.
Anyone have any opinion as to whether network speed could cause a premature restore end, without triggering an error?
mardi 28 août 2012 15:19Modérateur
Whilst the network speed may not be the direct cause for failures, it's stability/reliability is now a question.
I suspect the prolonged latency caused timeouts in either layer yet come by undetected, I thus suggest to deviate from implementing large restored over such networks.
Arthur My Blog
- Marqué comme réponse idcowden mercredi 29 août 2012 14:18