locked
Actor Garbage Collection issue - Memory not reclaimed RRS feed

  • Question

  • hi, 

    I created reliable actor service(volatile not persisted) with the following configuration, 20 secs of idle timeout and 5 secs of GC scan interval, 

     var garbageCollectionSettings = new ActorGarbageCollectionSettings(20, 5);
                    ActorRuntime.RegisterActorAsync<Task1> (
                       (context, actorType) => new ActorService(context, actorType,settings:new ActorServiceSettings() {
                            ActorGarbageCollectionSettings = garbageCollectionSettings
                       })).GetAwaiter().GetResult();
    
                    Thread.Sleep(Timeout.Infinite);

    I have the following code in my actor, below code reads a 2 MB file, and sets the contents of the file to FileContents (Actor level variable)

    public Task<string> GetGreetings(string name)
            {
                using (var reader = new StreamReader(@"E:\temp\Document.docx"))
                {
                    FileContents = reader.ReadToEnd();
    
                }
    
                return Task.FromResult<string>("Hello Mr " + name + " Greetings");
            }
    

    with this in place, I created 4 clients as below calls above method

    dotnet Actorclient.dll
    dotnet Actorclient.dll
    dotnet Actorclient.dll
    dotnet Actorclient.dll

    and I opened Task Manager,  corresponding process for this Actor service is Task1.exe (I have 3 exes as it is stateful, one for primary other two for secondaries) one of the Task1.exe (out of 3, which is primary) spiked up in memory because of above client execution. But After Idle timeout I received message in diagnostics saying that those 3 actors are deactivated (3 times message). But process memory for Task1.exe doesn't go down even after 40mins, GC supposed to collect the memory from deactivated instance every 5 secs interval (that is scan interval for Actor). Memory never claimed back, I am looking that the memory should go back to wherever it is started, for eg if Task1.exe is started at 40Mbps, then it should go back to 40Mbps ,this is not happening. So what is Garbage Collection really mean as per the documentation here (), if the object memory is freed, then this should be visible in the Task1.exe process memory. Or my understanding about Garbage Collection is wrong totally? can someone please answer this question?

    I have around 100 applications I need to convert them to actors, but with this memory leak unanswered I can't move forward with Actors. So can you please help me understand.

    Friday, November 22, 2019 1:26 PM

All replies

  • I am not seeing the link you are mentioning in your post but I am assuming it was this one: 

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-actors-lifecycle

    Just sharing to be sure :) 

    Can you tell me if the memory starts to stack? Or are you seeing it at a consistent level just higher than the 40mbps that it started with? 

    Monday, November 25, 2019 8:19 PM
  • First of all.., thanks very much for responding, I am really looking for help on this, 

    Yes, the reason I couldn't share the link, site is not allowing me to share until my account is validated(not sure how the validation process goes)

    Here the Actor is Task1.exe is not persistent one, it came up with single exe file.   

      [StatePersistence(StatePersistence.None)]
        internal class Task1 : Actor, ITask1
        {

    When I look at the Task.exe in task manager under memory it started with around 39mbps, executed first actor client 

    dotnet Actorclient.dll

    it hits the following code in Task1 actor, here Document.docx is around 2 Mb file.

     public Task<string> GetGreetings(string name)
            {
                using (var reader = new StreamReader(@"E:\temp\Document.docx"))
                {
                    FileContents = reader.ReadToEnd();
    
                }
    
                return Task.FromResult<string>("Hello Mr " + name + " Greetings");
            }
    

    Here is the run, 

    Initially Task1.exe is using 39.5 MB as shown below. 


    I invoke ActorClient 20 times in parallel using powershell script I created, it basically hits above method 20 times

    here is the Task1.exe looks after the test run 277.7MB


    I have set Actor idle timeout 20seconds, Scan interval to 5 secs , so my expectation is memory consumed after the test 277.7MB should go down to 39 MB (where it started) after 30 secs or so. I waited for 2 hours, memory still didn't go down. It is going down to 39MB at free will (whenever it wants, most of the times after I come back to my laptop longtime), I am not sure why Scan Interval is not in effect here, or my understanding about memory reclaim is wrong ? 

    My only worry is  if 100  actor jobs(I have 100 apps to convert) are consuming memory like this and not freeing up then cluster may run down with memory for other micro services, we have around 50 or so micro services also.

    Can you please provide your expertise in this area, help me understand the behavior, is this really a problem I need to worry?

     

    Tuesday, November 26, 2019 7:02 AM
  • Thanks for the clarification. So it does sound like it is getting cleared but the time intervals you have in place are basically being ignored. 

    Have you tried different time intervals by chance? Just curious if shorting/ lengthening might help. 

    Also, are you running your cluster on premise or in the Azure Cloud? 

    Tuesday, November 26, 2019 6:45 PM
  • I tried with the below option, 40 seconds idle time and 20 seconds GC scan interval. I have 5 actor services and a client creates 160 actor instances of actors of each actor service.

    new ActorGarbageCollectionSettings(40, 20);

    All the Actor services (Task1, Task2, Task3, Task4, Task5) started at around 40MB, then I Invoked Powershell client to create 160 actors of each,  while each services creating 160 actors each.., one service memory went up to 400 plus MB, and suddenly came down to 250MB (for one of the process, is this GC collection?), and then starts to continue grow up. It is kind of going up and down and in the end settled at around 160MB average, here the process remained with this memory for long time, and all the actor instances are deactivated. This memory went back to average 40MB after 1 hour or so. 

    So based on this behavior, is the Scan Interval only finds the actors for GC collection, mark them for GC collection but not GC collecting immediately?? GC collection happens during it's natural course and collect the actor services memory?? 

    the below statement from link should be changed from "collects" to "marks"

     https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-actors-lifecycle 

    "The actor runtime checks each of the actors every ScanIntervalInSeconds to see if it can be garbage collected and collects it if it has been idle for IdleTimeoutInSeconds."


    Thursday, November 28, 2019 7:05 AM
  • I have been reviewing the Actor Documentation we have and I believe what you are finding is correct. The GC marks the actor for deletion but does not actually clear it from memory right away 

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-actors-introduction#actor-lifetime

    I am going to update the documentation to state "Marks" instead of "Collects" to make it easier for users to grasp. 

    The document should reflect the changes after a few hours :) 
    Monday, December 2, 2019 8:57 PM