locked
Background Thread hung on System.IO.__Error.WinIOError and No Exception Is Ever Thrown RRS feed

  • Question

  • I have a multi-threaded application running on 2008 R2 that processes files on a remote server.  This is a new server that I have been experiencing random IO errors with.  When a network error is encountered opening a filestream, my background thread which was accessing a remote file is getting hung.  When this happens, I would expect an Exception to be thrown but nothing happens.  The thread just sits and waits.  I found all of this using a couple profilers like Eqatec and SlimTune.  When this happens, I will see System.IO.__Error.WinIOError in the profile but the nothing else happening after that.
    Originally my code was written using File.Exist when I first uncovered this.  When I would see this error before, I think I saw an internall method call to somethign like FillProperties.  I am not positive on the method name as I was not taking good notes then.  The first thing I did was try to create my own FileExists method by opening a FileStream (code below) as I just needed to know if the file was present and I thought I could bypass all the internals of File.Exists loading FileInfo information.
    I am at my Wit's end with this one.  Any help with trying to figure out why I am not getting an excpetion would be greatly appreciated.  I have tried remote debugging and since the server is not on my network, I have not been successful with that either.
    Here is the order of the calls I am seeing before the hang occurs
    1. System.IO.FileStream..ctor(string, System.IO.FileMode, System.IO.FileAccess)
    2. System.IO.FileStream..ctor(string, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, int, System.IO.FileOptions, string, bool)
    3. System.IO.FileStream.Init(string, System.IO.FileMode, System.IO.FileAccess, int, bool, System.IO.FileShare, int, System.IO.FileOptions, SECURITY_ATTRIBUTES, string, bool, bool)
    4. Microsoft.Win32.Win32Native.SafeCreateFile(string, int, System.IO.FileShare, SECURITY_ATTRIBUTES, System.IO.FileMode, int, IntPtr)
    5. System.IO.__Error.WinIOError(int, string)  (This is where everything on this thread stops and no exception is ever generated)
      public static bool FileExists(string fileName)
      {
       if (string.IsNullOrWhiteSpace(fileName))
        return false;
    
       try
       {
        <strong> using (FileStream fsSource = new FileStream(fileName,FileMode.Open, FileAccess.Read))</strong>
        {
         fsSource.Close();
        }
        return true;
       }
       catch (FileNotFoundException)
       {
        return false;
       }
       catch (DirectoryNotFoundException)
       {
        return false;
       }
       catch (Exception ex)
       {
        Console.WriteLine("Error in FileExists for file " + fileName + Environment.NewLine + " Error: " + ApexSystem.Exceptions.Support.BuildErrorMessage(ex));
        return false;
       }
      }
    
     
     
     

    Thursday, July 14, 2011 10:56 AM

Answers

  • Have you ever timed it to see how long it hangs for? I know when doing disk I/O that the OS will wait for the disk controller to get back to it. The disk controllers I worked with would have a ten minute timeout; so if the disks hung it'd take ten minutes before an error was reported back to the OS.

    So your behavior may depend on the network driver. If the network driver is attempting retries around the disconnect I don't think there's anything you can do.

    • Proposed as answer by Paul Zhou Tuesday, July 19, 2011 3:09 AM
    • Marked as answer by Paul Zhou Monday, July 25, 2011 5:42 AM
    Thursday, July 14, 2011 9:00 PM

All replies

  • Have you tried debugging this with WinDbg?  I know that gets pretty involved, but it could shed some light on what is happening.  I think using the FileStream is going to come with more overhead then calling File.Exists, so I don't think you're on the right track with that.  If you want the scaled down File.Exists then directly call to FindFirstFile.

     

    [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
    internal static extern SafeFindHandle FindFirstFile(string fileName, [In, Out]Win32FindData data)
    
    internal class WIN32_FIND_DATA
    {
      internal int FileAttributes;
      internal uint CreatedLowDatetime;
      internal uint CreatedHighDateTime;
      internal uint LastAccessLowDateTime;
      internal uint LastAccessHighDateTime;
      internal uint LastWriteLowDateTime;
      internal uint LastWriteHighDateTime;
      internal int FileSizeHigh;
      internal int FileSizeLow;
      internal int Reserved0;
      internal int Reserved1;
    
      [MarshalAs(UnmanagedType.ByValTStr)]
      internal string FileName;
      [MarshalAs(UnmanagedType.ByValTStr)]
      internal string AlternateFileName;
    }
    

    Then check SafeFindHandle.IsInvalid and if it's false you can call to Marshal.GetLast32Error() to find out more information about what's going on.

     Don't forget to call Close() on the SafeFindHandle when you're done with it.

    Thursday, July 14, 2011 2:29 PM
  • Thanks for the feedback.  Right now I think getting into WinDbg seems like a lot of work to see something that is going to look meaningless to me.  I am also leary of calling kernel32 as I am trying to ensure full compatibility with 64bit but if I get desperate, I may give it a try. I hope someone on the IO team has a little more insight as to way may be going on here.

     

    Thursday, July 14, 2011 7:25 PM
  • Have you ever timed it to see how long it hangs for? I know when doing disk I/O that the OS will wait for the disk controller to get back to it. The disk controllers I worked with would have a ten minute timeout; so if the disks hung it'd take ten minutes before an error was reported back to the OS.

    So your behavior may depend on the network driver. If the network driver is attempting retries around the disconnect I don't think there's anything you can do.

    • Proposed as answer by Paul Zhou Tuesday, July 19, 2011 3:09 AM
    • Marked as answer by Paul Zhou Monday, July 25, 2011 5:42 AM
    Thursday, July 14, 2011 9:00 PM