none
Windows Server Backup fails when SQL VSS Writer service is running RRS feed

  • Question

  • Hello. On one of our SQL servers, we have SQL Server 2008 Express installed. Whenever is the "SQL Server VSS Driver" service running, backing up the computer with Windows Server Backup will always fail. If I stop SQL VSS writer service, then it the backup will finish successfully, although apparently backup of the SQL DBs is not consistent.

    Details:

    - backup is executed with "wbadmin start backup -allcritical -backuptarget:d: -vssfull -quiet"

    - root of the SQL Server instance (and all DBs, trans. logs etc) are located on a volume E:

    - once the backup command is executed, it's stuck on the creating volume shadow copies operation for several minutes, failing afterwards

    - SQL VSS Writer has status 8 (failed), last error is "non-retryable"

    - eventlog contains myriad of messages since the backup is started (SQL server has hundreds of DBs on it), with variations of these 3 repeating:

      a) SQLVDI: Loc=TriggerAbort. Desc=invoked. ErrorCode=(0). Process=1304. Thread=15788. Server. Instance=SQLEXPRESS. VD=Global\{A4C1E365-459A-4D41-B3D0-1A76DB8E56F6}517_SQLVDIMemoryName_0. Source: SQLVDI, EventID: 1. This is repeated numerous times, where thread id varies and number (517 in this case) in the VD identifier

      b) BACKUP failed to complete the command BACKUP DATABASE <dbname>. Check the backup application log for detailed messages. (I removed the actual DB name, substed with <dbname>). This is repeated numerous times (I guess for every DB on the server)

      c) event with missing description, here's the error content:

    ICommandText::Execute

    0x80040e14

    SQLSTATE: 42000, Native Error: 3013

    Error state: 1, Severity: 16

    Source: Microsoft SQL Server Native Client 10.0

    Error message: BACKUP DATABASE is terminating abnormally.

    SQLSTATE: 42000, Native Error: 3224

    Error state: 1, Severity: 16

    Source: Microsoft SQL Server Native Client 10.0

    Error message: Cannot create worker thread.

    It's event with source SQLWRITER, eventid: 24583. This event is also repeated numerous times.

     

    Note: the system already has default 100MB System reserved volume (active, system, primary) and the system volume (C:) (boot, page file, system state, primary), so hints as to mark the C: drive volume active are irrelevant.

    System details: virtual machine - Windows 2008 R2 Standard, SQL Server 2008 Express with Advanced services SP1, updated

    Already tried updating SQL to CU6 and also CU7. Also tried upgrading to SQL 2008 R2 Express. None of this helped.

    Thanks in advance for any help.

    Monday, July 19, 2010 2:34 PM

All replies

  • Please check the answers in this post, since it seems to be identical to your problem.

    http://social.technet.microsoft.com/Forums/en-US/winserverfiles/thread/9cf42e8a-2a33-47c5-a797-269330e9ba1a

    There is also quite a bit of follow-up on issues that others found.

    RLF

    Monday, July 19, 2010 8:57 PM
  • Thanks Russell, but I've already gone through that (and other threads) before posting this.

    I'm actually trying to solve this problem for several months, so it's not like I've encountered the problem yesterday and went to write about it on technet forums.

    I hope I've included detailed description of the problem, let me know if more details are required.

    Monday, July 19, 2010 10:55 PM
  • MarkosP,  A week has passed, but I was reading the  Microsoft SQL Server Tips & Tricks blog and ran across this:

    http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/28/backup-software-fails-to-take-system-state-backup-if-sql-server-vss-writer-service-is-running.aspx

    If you have not already read this, then check it out and see if it helps resolve your problem.

    RLF

    Tuesday, July 27, 2010 5:18 PM
  • Thanks again Russell. Interesting article, unfortunately it didn't help me. In the registry I have two valid instances registered - one is the main SQLEXPRESS instance, the other is the Windows internal DB used by the WSRM service (MICROSOFT##SSEE). I tried removing the entry for the MICROSOFT##SSEE instance, but that didn't help and backup failed still.

    I noticed that first there are ton events of a) and b) from my original post, then there's spam of c) messages, then it's all mixed up, few hundred messages later one more (different) error pops up numerous times from SQLWRITER - the error message is "BackupVirtualDeviceSet::AllocateBuffer: failure on backup device '{68B3B056-7C06-4E00-941E-2DC0DEB8382A}361'. Operating system error 995", but I suspect that's only the result of failing to create the worker thread in the first place.

    Just had an idea if this couldn't be cause by the fact that the instance in installed in non-default location, its root is E:\MSSQL10.SQLEXPRESS\MSSQL. Maybe the SQL writer has some hard-coded locations and doesn't like this? But I would find that hard to believe, it's just I'm out of other ideas.

    Just so you have an idea - while the backup tries to run, it produces almost 2800 errors events in the event log. I guess this is because of the large number of DBs that are hosted on the SQL server (almost 600 now).

    Wednesday, July 28, 2010 3:30 PM
  • In the meantime, are you stopping VSS for the duration of the backups, then restarting it?  (I hope so, so that you are not without backups.)  That was the 'workaround' several people used.

    Have you considered raising a support incident with Microsoft?  Not necessarily free, unless they fine a bug that needs fixing.  If you have MSDN you may have a couple of support incidents available.

    RLF

    Wednesday, July 28, 2010 5:30 PM
  • For the meantime, we stopped backing up SQL using WSB, instead we deployed Express Maintenance utility from Jasper Smith, which works decently.

    PSS with MS would be nice, but it's too expensive here and there's no budget for it, so I'm on my own.

    Monday, August 2, 2010 8:29 AM
  • Any news about failing WSB when SQL VSS Writer service is running???

    In my case the backup fails on W2k8 R2 Hyper-V host when creating a bare metal VSS full backup including all volumes, specifically on running virtual Forefront TMG 2010 (SQL Server 2008 Express): Failed to create the backup of virtual machine 'TMG'.
    What I really don't understand is that a VSS full backup of ONLY the VHD's of running TMG virtual guest succeeds?!?!?!

    BTW thanks Russell for your little suggestion in question, which I didn't catch somehow in any of the posts I've been through. Stopping SQL VSS Writer service on above mentioned virtual TMG has helped me to complete backup successfully on the host. I just can't imagine all of the impacts of stopping this service??

    Tuesday, November 23, 2010 11:27 PM
  • I am really not very VSS savvy, but I figure that a full database backup that actually works is worth quite a few shadows that I cannot use.  I personally do not use software that makes use of VSS for backing up my SQL Server databases. 

    However, I did spend a little more time cruising the Microsoft blogs after this post popped back up. Another blog post that I did not find earlier is:

    http://blogs.technet.com/b/asksbs/archive/2008/04/23/troubleshooting-vss-and-backup.aspx

    Also, during any other research that you do, please beware the following suggestion since it can lead to an unbootable computer:  "If the Active volume is different from the volume in which you install the operating system, try changing the Active volume to the volume containing the operating system."

    RLF

    Wednesday, November 24, 2010 2:44 PM
  • Two things that I noticed in the posts above:

    1. Number of databases - high (600 and probably more or less now)

    2. Initial error - "Cannot create worker thread"

    The above error means that there are not enough available worker threads available to execute the commands that VSS wants to obtain the list of databases on the SQL instance. This can happen because of two reasons:

    a. Because the large number of databases, you are running out of available worker threads. You can try and increase the number of available worker threads by using sp_configure by 128 and check if that allows the backup to be completed. This will affect your BPool memory area because the thread stack memory for the additional 128 threads will be borrowed from your BPool.

    b. You are running short of non-BPool memory and due to this a new worker creation is failing. Do you have any memory related errors in the SQL Errorlogs when the backup is being taken?


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Thursday, November 25, 2010 8:42 PM
  • Thanks for informative post Amit. I'll try increasing number of worker threads by 128 and see what will happen.

    Don't have any memory related errors as far as I can tell.

    Will have to do a little research on what BPool memory is :)

    Friday, November 26, 2010 8:17 AM
  • Hm, increasing maxworkerthreads didn't have any positive effect. I've increased the value from the default 0 to 512 and later to 1024, neither helped. Didn't go further as it was already beyond what's recommended for servers with many more CPU than out server has (it's VM with 4 vCPUs).
    Friday, November 26, 2010 11:26 PM
  • Do all your SQL database files reside on a single drive? If yes, then you will not be able to perform a VSS Snapshot backup because of the number of worker threads required to perform the backups and the additional memory requirements for the backup buffers.

    Reference: http://blogs.msdn.com/b/psssql/archive/2009/11/13/how-it-works-how-many-databases-can-be-backed-up-simultaneously.aspx

    Reference for SQL Buffer Pool: http://troubleshootingsql.com/2009/12/30/sql-server-memory-basics/

    HTH


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Sunday, November 28, 2010 1:16 PM
  • Thanks for the information Amit, although it's not too encouraging.

    If I understand the figure, I would need to set the maxworkerthreads to roughly databases/2 for the backup to work, which might be impractical with hundreds of databases.

    And yes, all those DBs are on a single volume.

    Tuesday, November 30, 2010 10:27 PM
  • Can you create additional instances and distribute the DB-s?

    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    • Edited by Kalman Toth Wednesday, September 26, 2012 11:22 PM
    Wednesday, December 1, 2010 12:48 AM
  • No, not really. I realize that SQL Express is not really suitable for that many databases, but apparently the issue wouldn't go away even with regular versions of SQL server. Not sure if I've mentioned it above, but there are currently about 900 databases (each DB is very small - about 30MB or less) and the count can go much higher still. It's used by and accounting program and there's no way to limit the number of databases created.

    The issue is that WSB (and MS DPM 2010 too) fails to do a Bare-metal backup because of this issue, since it considers the volume with SQL instance as one of its "critical" volumes, which are backed up during the BMR backup.

    So far, the only workaround is to disable the SQL VSS Writer service and let the BRM backup proceed and be aware that the SQL databases cannot be recovered from this backup (at least the consistency cannot be assured) so we have to use that separate scripted backup of the SQL databases, which is somewhat inconvenient. Also when SQL VSS writer is disabled, we can't backup the SQL server from MS DPM 2010 and have to resort to custom scripts.

    Wednesday, December 1, 2010 8:06 AM
  • The only option would be to have the database files residing on multiple volumes. Increasing worker threads is not the solution here.


    If you take a backup with SQL Server VSS Writer being turned off, you lose the option of restoring the SQL database from that backup. IMO for Bare Metal
    backups, you dont need the SQL databases as Bare Metal is only meant for Core Sytem recovery and not for application recovery.


    Or you could switch to native SQL backups for the SQL databases using a maintenance plan or a scheduled SQL Agent job.


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Wednesday, December 1, 2010 7:32 PM
  • I just blogged about this along with the possible workarounds:

    http://troubleshootingsql.com/2010/12/06/volume-shadow-barfs-with-0x80040e14-code/


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Monday, December 6, 2010 11:14 AM
  • Thank you Amit, nice analysis and summary of options.

    None of them ideal for us, but we'll have to pick one apparently.

     

    One more question - would this also hinder backups of Hyper-V guest machines? Guest being the SQL server. IIRC when doing backup of a Hyper-V machine, it internally invokes VSS snapshot so that you can later do ILR (item-level recovery). So I guess the answer is these backups won't work either, just need a confirmation.

    Monday, December 6, 2010 11:55 AM
  • I have not tried this on a Hyper-V guest machine, so wouldn't be able to comment on this at this point.
    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Wednesday, December 8, 2010 11:30 AM
  • It definitely affects Hyper-V host backup of guests. I was forced to set max worker threads to 3000, so 1024 isn't that much, MarkosP. To be sure and get the exact number, after a failed snapshot use
    SELECT status,COUNT(*) AS counts FROM sys.dm_os_schedulers GROUP BY status
    as Amit suggests in his blog
    Saturday, January 8, 2011 6:14 AM
  • Thanks for help! Problem solved!

    In my environment (over 700 databases) setting MAXIMUM WORKER THREADS to 2048 resolved the issue! This have to be at least twice the number of databases.

    This setting can be changed in Management Studio on Processors tab in the instance Properties window. It even does not require a restart.

    Thank you!



    Sunday, September 1, 2019 7:24 PM