locked
Thousands and Thousands of Services RRS feed

  • Question

  • I posted this in the Simulation forum, but am re-posting it here at Michael W's request.

    I'm simulating a modular robot project, and it's working well -- each module is a service and they're communicating nicely, getting set up and dropped programmatically.

    Now I need to run a huge number of simulations, each with a different collection of modules.  So, programmatically starting ~20 services/entities, evaluating them together in sim, then dropping them and starting again.  But I have serious performance issues, after the first couple of simulations, DSShost slows to a crawl.  I'm not exactly sure why.  I played with the new DSS Log Analyzer, but I'm not able to get it to analyze my logs (separate post).

    A workaround might be to batch the simulations, or even to restart the simulation or subscription manager services between runs, but these are pretty inelegant solutions.

    Basically, I'm wondering who's done work like this and if you have suggestions or best practices to share.

    Thanks,
    Eric
    Sunday, July 27, 2008 7:04 PM

Answers

  • Found the issue. On shutdown one of our internal services associated with user service (the Partner manager) was not responding to drop, thus preventing correct teardown of the service forwarder path *and* the subscriptions! So indeed, what you see is a regression in CTP2 and CTP1. Thank you for persisting.

     

    To work around this:

     

    1) when you subscribe, supply a NotificationShutdownPort.

    2) when your subscriber servicer receives a drop, you need to manually post a Shutdown() message to your NotificationShutdownPort instances

     

    sorry for the hassle, in Final, the runtime will automatically cleanup subscriptions, like its supposed to!

     

    • Marked as answer by Trevor Taylor Sunday, May 16, 2010 12:35 AM
    Wednesday, July 30, 2008 6:49 PM

All replies

  • DssHost can support 10s of thousands of services starting and stopping so i am guessing the issue is the with the graphics/physics engine., alhtough it does not sound like you are dropping the simulation engine itself, correct?

     

    Do you observe this behavior when you use no simulation services?

     

    g

    Monday, July 28, 2008 2:51 AM
  • Hi George, thanks for your reply.  That's correct, I start the simulation engine once for the entire set of trials, adding various configurations of robots to the scene, evaluating, then dropping them and adding more.

    Great idea to test without simulation.  I built a little framework so that I could do this, and while at first it appeared to solve the problem, it looks like the same slowdown pattern is still there, removing simulation just made the whole thing run faster.

    Now, though, after running a few trials, I see some "queue limit exceeded, discarding inbound..." messages, even after the simulations are complete and all relevant services are dropped.  The message that is filling the queue is a specific message that module services send to each other.  It appears that even though the services are dropped, there are still messages floating around, perhaps causing this slowdown.

    After each evaluation, I send a Drop message to each module's service.  I have a custom Drop handler that finds the related entity in the simulation, deletes it, and then sends a DsspDefaultDrop to base.  During evaluation, each module uses SubscriptionManager to handle communication with other modules.  Maybe the SubscriptionManager is somehow persisting and clogging up the works with messages even though the services are dropped?  Do I need to explicitly remove the SubscriptionManager and somehow purge messages?

    Thanks for your help,
    Eric
    Monday, July 28, 2008 1:52 PM
  • Hi, it does sound like subscriptions are not being cleaned up.

     

    The error message is the runtime telling you messages are getting piled up and there is no one listening....

     

    The DSS runtime is supposed to delete subscriptions on your behalf, if you properly shutdown each service (send it Drop and if you use the ServiceBehaviorHandler attributes, the default drop handler should take care of it). If you do have a Drop handler, make sure it is getting called, and that you do call down to the base.DefaultDrop handler...

     

    you can also check the subscription manager instances.

     

    Another thing youc an try is expicitly sending a DeleteSubscription message to your subMgr partner.

     

    g

     

    Monday, July 28, 2008 5:42 PM
  • I made a change and am now sending a Drop to the SubscriptionManager, which seems to have taken care of the full queue problem.  It doesn't help with the general slowdown however.  Here's my custom Drop Handler:

    Code Snippet

    [ServiceHandler(ServiceHandlerBehavior.Teardown)]
    public IEnumerator<ITask> CustomDropHandler(DsspDefaultDrop drop)
    {

    // Delete the roBlock entity
    //SimulationEngine.GlobalInstancePort.Delete(_state.blockObject.ent);

               

    _submgrPort.Post(new DsspDefaultDrop(DropRequestType.Instance));
    base.DefaultDropHandler(drop);
    Console.WriteLine("dropped it like it's hot!");
    yield break;

    }


    Now, I'm getting a URI error when I try to drop the subscriptionManager but I don't really understand it.

    *** "TaskExecutionWorker:HandleException": ExceptionTongue Tiedystem.UriFormatException: Invalid URI: The format of the URI could not be determined.
       at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
       at System.Uri..ctor(String uriString)
       at Microsoft.Dss.Services.Forwarders.Dssp.DsspForwarder.DetachRelativePath(DetachRelativePath detach)
       at Microsoft.Ccr.Core.Task`1.Execute()
       at Microsoft.Ccr.Core.TaskExecutionWorker.ExecuteTaskHelper(ITask currentTask)
       at Microsoft.Ccr.Core.TaskExecutionWorker.ExecuteTask(ITask& currentTask, DispatcherQueue p, Boolean bypassExecute)
       at Microsoft.Ccr.Core.TaskExecutionWorker.ExecutionLoop()

    Thanks,
    Eric
    Monday, July 28, 2008 7:41 PM
  • Hi eric, you dont need to send a drop to your own submgr (but it should work anyway, so will investigae the error above). We by default drop all subscription managers associated with services.

     

    Btw, this is all CTP2, correct?

     

    In terms of slow down, can you please check if DssHost is using alot of CPU when ti shouldnt? Is the console/output service constanting showing new log messages? We basically need to figure out who is sucking CPU time making your host slow.

     

    thanx

    g

     

    Monday, July 28, 2008 10:06 PM
  • Hi George - Yes, this is all CTP2.  DssHost is using lots of CPU, as much as it can get.  The console and output aren't showing *that* many messages, just notification that each service is starting and regular debugging messages that I throw in here and there.  Currently, I'm going through and turning off various message-passing functions to see if I can pinpoint what's causing it.

    Thanks,
    Eric
    Monday, July 28, 2008 10:20 PM
  • Shoot, no luck.  I pulled everything out to a basic framework where I'm just starting services, waiting a few seconds and then dropping them, and I still see the slowdown.  During this last run, my neighbor came over to chat, so I left it running for longer than usual, and when I came back, there was an OutOfMemoryException in Microsoft.Ccr.Core.dll.

    Thanks,
    Eric
    Monday, July 28, 2008 11:33 PM
  • something is generating messages like crazy and the queues are getting full. Service ports are shielded since we have queueing policy by default that throws away messages when queues reach 128 queued up messages or more.

     

    I am trying to reproduce what you are doing by doing the following ( i just modified hosting tutorial 4).

     

    1. Start 1000 Service Tutorial 5 instances (each one of them in turn subscribe to a single instance of Service Tutorial 4 clock notifications
    2. Do a GET on the last created instance
    3. Drop all of them

    Update:

     

    Service Tutorial 5 was not tearing down its notification receivers so i fixed that Smile Our other samples do the right thing.

     

    it now does this (and so should your services btw that have persisted notifications)

     

    Code Snippet
     MainPortInterleave.CombineWith(
                    new Interleave(
                        new TeardownReceiverGroup(),
                        new ExclusiveReceiverGroup(),
                        new ConcurrentReceiverGroup(
                            Arbiter.Receive<rst4.IncrementTick>(
                                true, _clockNotify, NotifyTickHandler),
                            Arbiter.Receive<rst4.Replace>(
                                true, _clockNotify, NotifyReplaceHandler)
                        ))
                );

     

     

    thanx

    g

     

    Wednesday, July 30, 2008 6:18 PM
  • Found the issue. On shutdown one of our internal services associated with user service (the Partner manager) was not responding to drop, thus preventing correct teardown of the service forwarder path *and* the subscriptions! So indeed, what you see is a regression in CTP2 and CTP1. Thank you for persisting.

     

    To work around this:

     

    1) when you subscribe, supply a NotificationShutdownPort.

    2) when your subscriber servicer receives a drop, you need to manually post a Shutdown() message to your NotificationShutdownPort instances

     

    sorry for the hassle, in Final, the runtime will automatically cleanup subscriptions, like its supposed to!

     

    • Marked as answer by Trevor Taylor Sunday, May 16, 2010 12:35 AM
    Wednesday, July 30, 2008 6:49 PM
  • Hi George - OK, great, thanks!  Now my drop handler looks sort of like this:

    Code Snippet

    [ServiceHandler(ServiceHandlerBehavior.Teardown)]
    public IEnumerator<ITask> CustomDropHandler(DsspDefaultDrop drop)
    {

    Shutdown off = new Shutdown();
    _subMgrShutdown.Post(off);


    // Delete the roBlock entity
    SimulationEngine.GlobalInstancePort.Delete(_state.blockObject.ent);

       

    base.DefaultDropHandler(drop);

         yield break;
    }


    But unfortunately it doesn't seem to affect my slowdown issue.  I wonder though:  I use ServiceForwarder in several other places throughout the app (orchestration svc communicating with individual module services, etc).  Could these be persisting even if the subscriptions are getting shut down?  I'm not sure how I'd explicitly drop these misc. messages, since they don't necessarily accept a NotificationShutdownPort...
    Thursday, July 31, 2008 4:22 PM
  • if you are sending periodic messages , outside subscriptions, make sure you tear down that code as part of your drop code.

    Above, to clarify, _subMgrShutdown, is a notificationShutdown port, tied to a subscription correct?

    To get more details on this go to the ResourceManager/Diagnostics URI and oberve which task queues seem to be incrementing, after you shutdown. THis will tell you which services are still active.

     

    its http://localhostStick out tongueort/resourcemanager/diagnostics/raw (for the xml version)

     

    g

     

    Thursday, July 31, 2008 10:52 PM
  • Aha!  After mucking about in ResourceManager, it appears that the dispatcher for each of my module services is not getting dropped appropriately.  So after ten runs of six modules each, I have sixty of these:

    <Dispatcher>
    <PendingTaskCount>0</PendingTaskCount>
    <ProcessedTaskCount>162</ProcessedTaskCount>
    <WorkerThreadCount>6</WorkerThreadCount>
    <Options>None</Options>
    <Name>http://schemas.tempuri.org/2008/05/roBlock.html</Name>
    −<DispatcherQueues>
    − <DispatcherQueue>
    <Name>http://schemas.tempuri.org/2008/05/roBlock.htmlBig Smile166a253-455c-4293-bf42-603b9ea28a23</Name>
    <IsUsingThreadPool>false</IsUsingThreadPool>
    <Count>0</Count>
    <ScheduledTaskCount>162</ScheduledTaskCount>
    <Policy>Unconstrained</Policy>
    <MaximumQueueDepth>0</MaximumQueueDepth>
    <CurrentSchedulingRate>0</CurrentSchedulingRate>
    <MaximumSchedulingRate>1</MaximumSchedulingRate>
    <Timescale>1</Timescale>
    </DispatcherQueue>
    </DispatcherQueues>
    </Dispatcher>


    I had assumed that the dispatchers would drop when the services dropped.  Please, a little guidance on how to manually drop them and free up the threads that they're using?  I had originally set the ActivationSettings to share a dispatcher, but if I do this, the contract directory needs to get rebuilt between each simulation, which takes a lot of time.

    Oh, and yes, _subMgrShutdown is a NotificationShutdownPort tied to a subscription.

    Thanks,
    Eric
    Saturday, August 2, 2008 7:44 PM
  • Since we found one bug here related to subscriptions, I'm going to start a new thread with the latest.  I'll call it "Thousands and Thousands of Dispatchers" so that when we're ready to write the book, it'll be easy to find.
    Wednesday, August 6, 2008 5:32 PM