none
FSLogix disk re-attach causes session crash RRS feed

  • Question

  • Hello - like others on this forum, we have hit a black hole with our Microsoft support for FSLogix. We had a case open, and until about 2 weeks ago were actively working with an engineer, but all of a sudden we stopped getting responses to our emails. Tried calling to reach the engineer to no avail.  Thought we would ask about our issue on this forum, in case someone else has experienced a similar issue and has any suggestions.

    We are using Citrix Virtual Desktops v 1808.2.0, with FSLogix release 1909 - Build 2.9.7205.27375.  We were using an earlier version of FSLogix and upgraded to the latest to try and resolve the issue below.

    So here’s a description of what we are seeing:

    • Periodically, a user will reconnect to their session and just see a blue background. To be clear, the system has not completely crashed - the user can not interact with the screen, but we can still gather event logs, connect to the C drive remotely to gather FSLogix logs, etc.
    • The issue is intermittent, and not reproducible on demand. But it’s happened to multiple users.
    • The only solution once this issue occurs is to kill the user’s VM (and all of their open applications) and start a new session. 
    • From our experience, the issue occurs when a session is disconnected, and we only become aware of it when the user reconnects and sees the blue background. 

    We started digging into the logs, and in the Windows System event logs, the first indication of an issue is a Disk warning (see below).  At the exact same time, we see an entry in the FSLogix profile logfile that indicates it is trying to re-attach a volume (see below).  Eventually the volume re-attaches, but by that time, the damage is done. After the volume reattaches, we continue to see NTFS errors in the System log - Event ID 50, delayed write failed to the OST file, and other 'Delayed Write Failed' errors. 

    Again, all of this is occurring when the user's VDI session is disconnected, so there is no user activity that is causing this behavior.

    Does anyone have any idea why the FSLogix volume would detach and then try to re-attach when the session is disconnected? We've dug through the log files, but the Profile log is the only one that indicates activity that corresponds with the start of this issue. Appreciate any suggestions or feedback!

    Thank you.

    Event log disk warning - note that "Disk 3" is the FSLogix profile disk:

    Log Name:      System
    Source:        Disk
    Date:          11/11/2019 3:29:21 PM
    Event ID:      153
    Task Category: None
    Level:         Warning
    Keywords:      Classic
    User:          N/A
    Computer:      WebExtTest02.<nameremoved>.com
    Description:
    The IO operation at logical block address 0x601168 for Disk 3 (PDO name: \Device\00000057) was retried.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Disk" />
        <EventID Qualifiers="32772">153</EventID>
        <Level>3</Level>
        <Task>0</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2019-11-11T23:29:21.501325800Z" />
        <EventRecordID>18822</EventRecordID>
        <Channel>System</Channel>
        <Computer>WebExtTest02.<nameremoved>.com</Computer>
        <Security />
      </System>
      <EventData>
        <Data>\Device\Harddisk3\DR4</Data>
        <Data>0x601168</Data>
        <Data>3</Data>
        <Data>\Device\00000057</Data>
        <Binary>0F01040004002C0000000000990004800000000000000000000000000000000000000000000000000002048A</Binary>
      </EventData>
    </Event>

    Snippet from FSLogix\Logs\Profile log file:

    [15:29:21.824][tid:00001168.0000116c][INFO]           Configuration setting not found: SOFTWARE\FSLogix\Profiles\ReAttachRetryCount.  Using default: 60
    [15:29:21.824][tid:00001168.0000116c][INFO]           Configuration setting not found: SOFTWARE\FSLogix\Profiles\ReAttachIntervalSeconds.  Using default: 10
    [15:29:21.824][tid:00001168.0000116c][INFO]           ===== Begin Session: Volume re-attach
    [15:29:21.824][tid:00001168.0000116c][INFO]            Attempting re-attach of volume: \\?\Volume{33bd6717-6e27-4d82-9e03-a33dc9cd6fd6}\ for SID: S-1-5-21-206643244-1649281867-1233284464-6245
    [15:29:21.824][tid:00001168.0000116c][INFO]            Acquiring mutex for reattach
    [15:29:21.825][tid:00001168.0000116c][INFO]            Configuration setting not found: SOFTWARE\FSLogix\Profiles\LogonSyncMutexTimeout.  Using default: 60000
    [15:29:21.825][tid:00001168.0000116c][INFO]            Mutex acquired
    [15:29:21.825][tid:00001168.0000116c][INFO]            VHDPath: \\server-fs\FSLogixUserProfiles\S-1-5-21-206643244-1649281867-1233284464-6245_<name removed>\Profile_<name removed>.VHDX
    [15:29:21.833][tid:00001168.0000116c][INFO]            Username: <name removed>
    [15:29:21.833][tid:00001168.0000116c][INFO]            Attempting re-attach as the user
    [15:29:21.833][tid:00001168.0000116c][INFO]            Retry Count: 60  Retry Interval (seconds): 10
    [15:29:21.875][tid:00001168.0000116c][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
    [15:29:31.876][tid:00001168.0000116c][INFO]            Retrying re-attach (1 of 60)
    [15:29:31.883][tid:00001168.0000116c][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
    [15:29:41.884][tid:00001168.0000116c][INFO]            Retrying re-attach (2 of 60)
    [15:29:41.891][tid:00001168.0000116c][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
    [15:29:51.892][tid:00001168.0000116c][INFO]            Retrying re-attach (3 of 60)
    [15:29:51.901][tid:00001168.0000116c][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
    [15:30:01.902][tid:00001168.0000116c][INFO]            Retrying re-attach (4 of 60)
    [15:30:01.949][tid:00001168.0000116c][INFO]            Successfully opened VHD file
    [15:30:02.188][tid:00001168.0000116c][INFO]            Volume successfully re-attached
    [15:30:02.189][tid:00001168.0000116c][INFO]           ===== End Session: Volume re-attach
    [15:30:02.190][tid:00001168.0000116c][INFO]           Volume attach event

    Tuesday, November 12, 2019 6:10 PM

All replies

  • FSLogix will not detach a volume intentionally while a user has a session on a host whether the session is disconnected or not.  There is not a lot of context here from this log snippet, but from what I can gather this looks from the surface like some type of environmental disconnect, maybe a network disconnect, or a restart or failover of a file server or something like that causing the VHD to disconnect, and have to be reattached.  In the case of a disk connection loss FSLogix will do it's best to recover quickly and reattach the disk.  During the time that the disk is not available there is some risk of applications including explorer.exe to run into unexpected issues.  It would be interesting to find out what the state of the computer is during the connection where you are only seeing the blue background.  In this state can you access task manager?  If so can you restart explorer.exe and recover the session, or do you believe that some type of full system deadlock has occurred?  What is the state of the host for other users that connect to it?  I would like to better understand those 2 things before suggesting possible solutions:

    1. Why is the disk connection being lost (network blips, fileshare reboots, host machine resource starvation, etc...)

    2. What is the actual state of the host when you are seeing the blue background (a session with no shell, or a deadlocked host)


    • Edited by Brian Mann1 Tuesday, November 12, 2019 7:25 PM
    Tuesday, November 12, 2019 7:24 PM
  • Hi Brian - appreciate the response. Very helpful to know that the disk should not be detaching - we weren't sure if the FSLogix disk issue was the underlying problem, or just a symptom. We will focus our efforts on trying to determine if there is a network/DNS/other issue that is causing the disk to detach. Incidentally, we have tried restarting explorer.exe when the issue is occurring - I can launch Task Manager briefly, but then it crashes/disappears. Even tried starting explorer.exe from ControlUp remote monitoring tool while the issue is occuring, but it doesn't have any impact. I'll update this thread with additional info as we troubleshoot the network issues. 

    Tuesday, November 12, 2019 8:35 PM
  • Hi Brian - appreciate the response. Very helpful to know that the disk should not be detaching - we weren't sure if the FSLogix disk issue was the underlying problem, or just a symptom. We will focus our efforts on trying to determine if there is a network/DNS/other issue that is causing the disk to detach. Incidentally, we have tried restarting explorer.exe when the issue is occurring - I can launch Task Manager briefly, but then it crashes/disappears. Even tried starting explorer.exe from ControlUp remote monitoring tool while the issue is occuring, but it doesn't have any impact. I'll update this thread with additional info as we troubleshoot the network issues. 

    I am still happy to help investigate further if needed.  Have you made any progress?
    Saturday, December 21, 2019 1:23 AM
  • We've been having exactly the same issue. We have also recently discovered that it's due to a random failover,  Checkpoint Firewall switching to the standby node. We believe this event terminates the existing TCP connections which causes user VHDs to detach and puts virtual machines into an unrecoverable state. I can also see that the VHD re-attach process completes successfully after approximately 10 attempts. However, the machine remains to be unresponsive and the only solution is a clean reboot. As you also stated, one of the vital system processes can't handle the failure and shuts itself down, which makes me believe that the fix won't be easy to dig up.

    Did you manage to find a solution to this? I have also started a similar thread on Citrix's forum, still waiting for an answer.


    • Edited by bdemir103 Wednesday, January 22, 2020 11:15 AM
    Wednesday, January 22, 2020 11:13 AM
  • Did anyone find an answer to this?  We've been having some network disruptions over the past month (we know we need to get to the bottom of them and stop them from occurring in the first place), and whenever these events occur it brings down all of our provisioned Windows 10 1809 desktops that have user sessions on them.  We use both FSLogix Profiles and FSLogix ODFC.

    We are able to simulate a network failure by quickly disconnecting and reconnecting the NIC in vSphere and have discovered some interesting results.  The provisioned Windows 10 desktop recovers fine from a network disruption (between 30-60 seconds long) if an FSLogix ODFC VHD is attached, but it crashes if an FSLogix Profile VHD is attached.  However, if the network disruption is less than 30 seconds, the machine recovers fine regardless of which FSLogix containers are attached.  

    Another interesting point is that we have a provisioned "Development" Windows 10 1809 environment that uses FSLogix Profiles too.  This environment does however recover from a network interruption of 30-60 seconds, even when pointing to the same profile share. We also have provisioned Windows Server 2012 XenApp which uses FSLogix Profiles, and these also recover fine from a network interruption that is greater than 30 seconds.

    Our conclusion at this point is that there's something in our Production Windows 10 image, combined with FSLogix Profiles, that renders the machine unrecoverable when there is a prolonged (i.e. 30+ seconds) loss of network connectivity.  We've looked into and tested possible differences in our Production and Development Windows 10 image including FSLogix agent version, NIC settings, PVS target device versions, etc. and are unable to find anything that affects how the FSLogix profile behaves when it loses its network connection.

    Since our network connection is currently unstable, this is a huge issue for us and we see over 800 desktops crash at once.  Hoping someone on this thread has been able to figure out why the FSLogix Profile detach/reattach attempt crashes a provisioned desktop.


    Tuesday, February 25, 2020 11:43 PM