none
Thread to thread communication RRS feed

  • Question

  • I am working on Windows service application using .NET 4.7.2 framework.

    One service (ServiceMonitor) keeps a note of heartbeats (Alive status) received from all other business services.

    Each business service has 2 threads. Main thread runs business logic on time elapsed event and second thread sends heartbeats on different time elapsed event.

    Expectation of heartbeat thought is, if the main thread fails/hangs due to memory leak or deadlock issue or any other unexpected issue, I was thinking it will block the whole windows service process causing the heartbeat thread also fail sending heartbeats. This helps ServiceMonitor to understand that the business service is dead and an action is required.

    In reality, I see, heartbeat thread is still working even though I made main thread to fail by leaking memory.

    Dim list = New List(Of Byte())()
    
    While True
        list.Add(New Byte(100000000) {}) //create a memory leak
        System.Threading.Thread.Sleep(100)
    End While

    Just for understanding, I am communicating heartbeats to a message queuing system for ServiceMonitor to listen.

    My questions are...

    1. Is this a right approach, if so, what else should I do, for the heartbeat thread also stop sending messages

    2. Is there a possibility of heartbeat thread checking main threads health and report to ServiceMonitor

    3. Any other better suggestion for monitoring health of these business services

    Thank you in advance. Appreciate your help.

    Friday, March 8, 2019 5:48 AM

All replies

  • Hi Azure

    Thank you for posting here.

    Dim list = New List(Of Byte())()
    
    While True
        list.Add(New Byte(100000000) {}) //create a memory leak
        System.Threading.Thread.Sleep(100)
    End While

    According to your code, I want to know what language you are using for your current project. Is visual basic or visual c#?

    We are waiting for your update.

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, March 8, 2019 6:22 AM
    Moderator
  • I have combination of both C# project and VB.NET. Business service runs on VB.NET and ServiceMonitor runs on C#
    Friday, March 8, 2019 11:34 AM
  • Threads run independent of each other so a thread that hangs won't impact the other threads. That is sort of the purpose of threads. A heartbeat thread that simply pulses every so often isn't going to detect issues in another thread.

    If you want to know whether another thread is blocked then you're going to have to sync them up. One approach that may work is to have the main thread set an event when it starts its "work" loop and then resets it when it is done. This gives you an indicator when the thread is running. The heartbeat thread could try to get the event (with a timeout). If it tries to get the event for a longer period of time than you expect the work loop to run then it could be stalled. But this is going to be a heuristic value at best. For example you might expect that a work loop takes 2 seconds but suddenly it takes 5 because there was a hiccup in the network or something. The heartbeat thread might detect it being "down" initially but it would be back "up" the next time.

    Rather than using an event you might also consider using a "last run" time. Each time the work loop runs it sets the last run time. The heartbeat thread compares the last known run time to the current value. If they are the same then it could be stalled. Same heuristic rules come into play here where a stall doesn't mean down. Note that you could theoretically also just use the Thread's execution time as well or P/Invoke to Win32 to get more detailed thread information. There are various "indicators" you could use but they would all work similarly.

    For the more specific cases you mentioned of OOM then your entire process is going to crash so that really isn't a scenario you can handle. Unless you get into critical regions and the advanced topics of that an OOM is going to terminate your app. A memory leak isn't going to cause your app to stall though. The GC will run behind the scenes and clean it up so in your very specific example of a memory leak, this isn't a stall.

    For the deadlock this would stall the thread so you could detect it via one of the approaches mentioned earlier. Alternatively you could switch to tasks and have the tasks terminate if they take too long. This would break the deadlock (hopefully) and allow you to try to recover. But this would be a more massive change.

    Ultimately you may consider doing away with multiple threads and simply use the built in service's OnCommand to handle different commands such as the hearbeat. If the command is handled by a single thread then sending a heartbeat to the service would stall if the OnCommand is still processing another request. But this is tied to the SCM timeout information and so it isn't as flexible.


    Michael Taylor http://www.michaeltaylorp3.net

    Friday, March 8, 2019 3:57 PM
    Moderator
  • As a side note, what you have created there is NOT a memory leak.  It's just a big memory consumer.  The code is perfectly valid, and the list is holding references to all the arrays you allocate.  It's true that the thread should use up all of the available memory sooner or later, but it's going to take a while.  64-bit processes have a BIG virtual memory space.

    Tim Roberts | Driver MVP Emeritus | Providenza & Boekelheide, Inc.

    Friday, March 8, 2019 8:04 PM
  • Threads run independent of each other so a thread that hangs won't impact the other threads. That is sort of the purpose of threads. A heartbeat thread that simply pulses every so often isn't going to detect issues in another thread.

    If you want to know whether another thread is blocked then you're going to have to sync them up. One approach that may work is to have the main thread set an event when it starts its "work" loop and then resets it when it is done. This gives you an indicator when the thread is running. The heartbeat thread could try to get the event (with a timeout). If it tries to get the event for a longer period of time than you expect the work loop to run then it could be stalled. But this is going to be a heuristic value at best. For example you might expect that a work loop takes 2 seconds but suddenly it takes 5 because there was a hiccup in the network or something. The heartbeat thread might detect it being "down" initially but it would be back "up" the next time.

    Rather than using an event you might also consider using a "last run" time. Each time the work loop runs it sets the last run time. The heartbeat thread compares the last known run time to the current value. If they are the same then it could be stalled. Same heuristic rules come into play here where a stall doesn't mean down. Note that you could theoretically also just use the Thread's execution time as well or P/Invoke to Win32 to get more detailed thread information. There are various "indicators" you could use but they would all work similarly.

    For the more specific cases you mentioned of OOM then your entire process is going to crash so that really isn't a scenario you can handle. Unless you get into critical regions and the advanced topics of that an OOM is going to terminate your app. A memory leak isn't going to cause your app to stall though. The GC will run behind the scenes and clean it up so in your very specific example of a memory leak, this isn't a stall.

    For the deadlock this would stall the thread so you could detect it via one of the approaches mentioned earlier. Alternatively you could switch to tasks and have the tasks terminate if they take too long. This would break the deadlock (hopefully) and allow you to try to recover. But this would be a more massive change.

    Ultimately you may consider doing away with multiple threads and simply use the built in service's OnCommand to handle different commands such as the hearbeat. If the command is handled by a single thread then sending a heartbeat to the service would stall if the OnCommand is still processing another request. But this is tied to the SCM timeout information and so it isn't as flexible.


    Michael Taylor http://www.michaeltaylorp3.net

    Assume if the memory leak happening from a third partly COM library which is consumed by our windows service. In such case, don't we have a hand to track and inform to service monitor?
    Monday, March 11, 2019 6:49 AM
  • While you can determine how much memory a process is using that may or may not be useful. How much memory do you expect your process to take up? When is there really a "memory leak"? Just because memory goes up if you poll it doesn't mean there is a leak. Allocate a 10MB array and memory goes up by 10MB. It will drop back down at some point and that isn't a memory leak. Of course if you see your process continually growing in memory usage over the span of a few minutes it could likely be a memory leak or you're just doing something that is taking a while to run and requiring more memory over time. Again, it'll eventually get released.

    My point is that there is no guaranteed way to determine there is a "memory leak" because of how .NET manages memory. You can end up with false positives. If you are running a 32-bit process then eventually your app could terminate but for a 64-bit process this is unlikely to occur. Your only real choice, in my opinion, would be to determine how much memory your app should take up over its entire run and then determine when that exceeds that amount. This really isn't memory leak detection but simply putting a threshold on memory consumption.

    I guess you could also monitor the size of the gen 3 memory. If stuff gets here then it is going to be around a while. As this grows the memory usage will grow but, again, eventually it'll get cleaned up when it is no longer needed.

    Of course all of this assumes you really care. You mentioned you have a service monitor but why? Besides monitoring your service's memory/CPU what does it do? If this goes into a dashboard of some sort then the user of your dashboard should probably be responsible for responding to the data as they can review the historical data to determine if something is amiss.

    If you are simply trying to ensure your service keeps running then you don't really need to do that at all. Windows has, for years, supported the ability to restart a crashed service and/or do some other recovery techniques.

    Summarily, you can write a tool to monitor the CPU usage and memory consumption of a thread or an entire process. Heuristically you'd have to decide when that exceeds a certain threshold but you need to look at it historically and not just a point in time. At any one point in time you could exceed one or both but this is transient so you'd need to see if it is not recovering properly. At that point you could consider the service to be unhealthy.


    Michael Taylor http://www.michaeltaylorp3.net

    Monday, March 11, 2019 1:58 PM
    Moderator