none
Performance Issue - FileSystemWatcher RRS feed

  • Question

  • Hi Experts, 

    Am working on file processing task where I need to bulk load the files into Database SQL Server for this am using the FileSystemWatcher which works as required and am a newbie in C#. 

    The problem starts to build up when the drive gets a massive load of 10-20 files gets loaded in the directory getting monitored with each file around 30-40 mb in size.. 

    The CPU starts peaking 50-60 percent.. To avoid the CPU issue , am trying 

    File Caching but still issue persists. Am thinking if I could delay the processing file by 1-2 seconds interval between each file that may reduce the overhead on the processor.  Please suggest if delay need to be added what's the best way or any other approach.

    The process method simply obtains the csv --> use csvhelper to load to data table --> datatable finally gets loaded into SQL Server using bulk insert.  

    It may contain 40-50 k records per file. Please share optimisation techniques 

      private static MemoryCache FilesToProcess = MemoryCache.Default;
          

    private static void AddToCache(string fullPath) { var item = new CacheItem(fullPath, fullPath); var policy = new CacheItemPolicy { RemovedCallback = ProcessFile, SlidingExpiration = TimeSpan.FromSeconds(5), }; FilesToProcess.Add(item, policy); }


       private static void FileCreated(object sender, FileSystemEventArgs e)
          {
              
    
              AddToCache(e.FullPath);
          }
    
    
            private static void ProcessFile(CacheEntryRemovedArguments args)
            {
    
                if (args.RemovedReason == CacheEntryRemovedReason.Expired)
                {
                    var fileProcessor = new DataProcessing(args.CacheItem.Key);
                    fileProcessor.Process();
                }
                else
                {
                   
                }
            }

    Thanks 

    Priya

    Thursday, December 12, 2019 12:43 PM

Answers

  • " I at times get an error that the task get locked with another process referring to the csv file"

    Correct. As I mentioned in my post, the FSW event is raised while the IO operation is occurring. Therefore in many cases the file is still locked by the owning process. If you attempt to do something with that file then it will fail. You have to implement retry logic to try again at a later point (hopefully after the process is done). This isn't trivial.

    Also be aware that different applications behave differently. For example an app may create an empty file (create event) and then write content to it (multiple change events). However another app may create a temporary file, write all the data there and then rename the file (rename event) or copy the file to the destination location (multiple change events). You either have to code for all these or understand which approach the app you're monitoring uses.

    However, DO NOT, do any blocking inside the event handler. If you do you're blocking the journaling API which, historically, blocks the underlying write. Everything MUST be done on a secondary thread after the event handler has returned. This is required. In your specific example code the for loop and try-catch need to be moved to a background worker thread. The only thing your handler should do is note the event in some shared queue. It's pretty old but here is a blog article I wrote years ago about working with FSW.


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Priya Bange Sunday, December 15, 2019 10:09 AM
    Saturday, December 14, 2019 5:33 PM
    Moderator

All replies

  • Highly level is to rethink storage of data as 40 to 50k hanging around is most likely the issue, perhaps offload to a file to bulk upload or if possible upload in chunks e.g. every thousand items.

    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Thursday, December 12, 2019 2:36 PM
    Moderator
  • Highly level is to rethink storage of data as 40 to 50k hanging around is most likely the issue, perhaps offload to a file to bulk upload or if possible upload in chunks e.g. every thousand items.

    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Dear Ma'am ,

    I have tried this approach still not much luck :( 

    Thanks 
    Priya

    Thursday, December 12, 2019 5:56 PM
  • You didn't post how you configured FSW. Based upon just the code you posted note the following issues:

    1) You are completely misusing MemoryCache to simply stall your process. This is an expensive and completely wasteful use of resources.

    2) A single file change can trigger multiple events so you are effectively adding duplicate processing for a single file.

    3) While it may not impact this very specific code you posted, you are not honoring the order of events so if you later add additional support for other file changes like rename or delete your cache is not going to guarantee the ordering which can be a problem.

    4) MemoryCache, at least the default version, doesn't strictly follow cache cleanup policies as you might expect. At least this is how I remember it working when someone was having an issue a year or so ago with it. It does lazy cleanup so using it to "time" file operations is not going to work properly in all cases and is a misuse of the system anyway.

    The correct approach, in my opinion, is to create a concurrent queue that stores the file event (changed, created, etc) and the filename. Have the event handler insert the event into the queue. Have a background thread monitoring the queue and whenever events come in it processes them one by one. Note that FSW will raise events even before the underlying process releases a file so your handler needs to have retry logic built in. In your specific case you could probably handle this by skipping the request and trying again later but if you also support delete/rename/etc then it becomes harder. Nevertheless given this architecture you don't have to have an arbitrary wait time on anything nor would it matter if you sent 5 files in 5 seconds or 25 in 1 second. The background thread would just take longer to run for larger workloads. The event handler would continue on without issue.


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, December 12, 2019 8:15 PM
    Moderator
  • You didn't post how you configured FSW. Based upon just the code you posted note the following issues:

    1) You are completely misusing MemoryCache to simply stall your process. This is an expensive and completely wasteful use of resources.

    2) A single file change can trigger multiple events so you are effectively adding duplicate processing for a single file.

    3) While it may not impact this very specific code you posted, you are not honoring the order of events so if you later add additional support for other file changes like rename or delete your cache is not going to guarantee the ordering which can be a problem.

    4) MemoryCache, at least the default version, doesn't strictly follow cache cleanup policies as you might expect. At least this is how I remember it working when someone was having an issue a year or so ago with it. It does lazy cleanup so using it to "time" file operations is not going to work properly in all cases and is a misuse of the system anyway.

    The correct approach, in my opinion, is to create a concurrent queue that stores the file event (changed, created, etc) and the filename. Have the event handler insert the event into the queue. Have a background thread monitoring the queue and whenever events come in it processes them one by one. Note that FSW will raise events even before the underlying process releases a file so your handler needs to have retry logic built in. In your specific case you could probably handle this by skipping the request and trying again later but if you also support delete/rename/etc then it becomes harder. Nevertheless given this architecture you don't have to have an arbitrary wait time on anything nor would it matter if you sent 5 files in 5 seconds or 25 in 1 second. The background thread would just take longer to run for larger workloads. The event handler would continue on without issue.


    Michael Taylor http://www.michaeltaylorp3.net

    Hi Sir,

    Is the below shared a rightful approach. I at times get an error that the task get locked with another process referring to the csv file. Am a newbie in C# and will be learning about concurrent queues this weekend.

    private const int NumberOfRetries = 3;
            private const int DelayOnRetry = 1000;
            private void FileCreated(object sender, FileSystemEventArgs e)
    
            {
                string rootDirectoryPath = new DirectoryInfo(e.FullPath).Parent.Parent.FullName;
                string inputFileName = Path.GetFileName(e.FullPath);
                string inProgressFilePath =
                  Path.Combine(rootDirectoryPath, InProgressDirectoryName, inputFileName);
                for (int i = 1; i <= NumberOfRetries; ++i)
                {
                    try
                    {
                       Process(e.FullPath, inProgressFilePath);
                        break;
                    }
                    catch (IOException) when (i <= NumberOfRetries)
                    {
                        Thread.Sleep(DelayOnRetry);
                    }
                }
            }

    Saturday, December 14, 2019 9:43 AM
  • " I at times get an error that the task get locked with another process referring to the csv file"

    Correct. As I mentioned in my post, the FSW event is raised while the IO operation is occurring. Therefore in many cases the file is still locked by the owning process. If you attempt to do something with that file then it will fail. You have to implement retry logic to try again at a later point (hopefully after the process is done). This isn't trivial.

    Also be aware that different applications behave differently. For example an app may create an empty file (create event) and then write content to it (multiple change events). However another app may create a temporary file, write all the data there and then rename the file (rename event) or copy the file to the destination location (multiple change events). You either have to code for all these or understand which approach the app you're monitoring uses.

    However, DO NOT, do any blocking inside the event handler. If you do you're blocking the journaling API which, historically, blocks the underlying write. Everything MUST be done on a secondary thread after the event handler has returned. This is required. In your specific example code the for loop and try-catch need to be moved to a background worker thread. The only thing your handler should do is note the event in some shared queue. It's pretty old but here is a blog article I wrote years ago about working with FSW.


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by Priya Bange Sunday, December 15, 2019 10:09 AM
    Saturday, December 14, 2019 5:33 PM
    Moderator