My code runs fine on DSS 2008 RTM, causes a pileup of Receiver<IAsyncResult> on DSS 2008 R3. Regression or my bug? RRS feed

  • Question

  • I've sketched up two trivial services (where one posts a PingRequest to the other, waits for a PingResponse, and repeats ad infinitum) to see what sort of round-trip times to expect from DSS in different scenarios: intra-node, inter-node on same machine, and inter-node across a LAN.  I'm running DSS 2008 R3 on Windows XP x64 in 64-bit DssHost processes.

    When running the services in the same node, performance and memory usage is stable, but when running across two nodes on the same machine, message processing becomes slower and slower, and the managed heap size grows as well.

    Inspection with a memory profiler shows a steadily growing instance count (thousands!) of Receiver<IAsyncResult> all in CleanedUp state and yet somehow all still registered with one lone Port<IAsyncResult> belonging to the DsspTcpTransportService.TcpDispatcherProcessor.ReadFromStream iterator.  These CleanedUp Receivers pile up and all seem to get iterated over and ignored every time a new IAsyncResult gets Posted to the Port, ultimately slowing incoming messages on the corresponding socket as they have to be looped over.

    Echo's Ping handler simply responds to PingRequest with PingResponse, copying its lone Tag string member.  EchoTester uses a Port<EmptyValue> for loop flow control and adds this exclusive handler into its service's main interleave:

            private IEnumerator<ITask> Fire(EmptyValue unused)
                var tag = Guid.NewGuid().ToString();
                var pingResult = _echoPort.Ping(new PingRequest { Tag = tag });
                yield return pingResult.Choice();
                var pingResponse = (PingResponse) pingResult;
                if (tag != pingResponse.Tag) throw new InvalidDataException();
                _state.Iterations += 1;
                if (_state.Iterations % 10000 == 0) LogInfo(String.Format("{0} pings ponged, {1} elapsed, {2} heap size", _state.Iterations, DateTime.UtcNow - _state.StartedAt, GC.GetTotalMemory(true)));

    This is just toy code, but even if I use a timer or something similar to ensure that concurrent handlers have a chance to get through the service's main port interleave edgewise, I see no difference in the rate of dead receiver accretion in the transport.  Am I doing something wrong here?

    EDIT: The problem doesn't appear if I run the exact same code under DSS 2008 RTM: there are exactly two instances of Receiver<IAsyncResult> no matter how long the code runs.  This makes me suspect a regression, though there's still always the likelihood that I may have goofed somewhere.

    • Edited by jeffrey8192 Wednesday, April 17, 2013 2:27 AM Add more information
    Wednesday, April 17, 2013 12:47 AM


  • I have some other issues that are preventing me from running later versions now...  need to clean up my system.

    I'm hoping your workaround of using the 2.0 binaries is adequate for now.

    Would you mind e-mailing me offline at roboinfo at micrososft.com...  I'd like to follow up and get more detail.


    • Marked as answer by jeffrey8192 Friday, May 3, 2013 10:27 PM
    Friday, May 3, 2013 3:31 PM

All replies

  • I'm having trouble setting up a repro. It is possible for you to post your code (to SkyDrive, Dropbox, ...)?
    Wednesday, April 24, 2013 5:06 PM
  • Well, the code was on a RAM disk that went south due to an ill-timed BSoD, but I rewrote it with some slight refactoring (such as combining the isomorphic PingRequest and PingResponse into a single PingPayload type) and the exact same symptoms appeared under R3.

    Two services, Echo and EchoTester, are in the package at http://sdrv.ms/12fdzJW .  Start an Echo in a node on (50002,50003), then an EchoTester in a node on (50000,50001), and watch the EchoTester's node get slower and larger.  I've observed this behavior whether the nodes are running as 32 or 64 bit processes.

    I'll likely soon blow XP away and install a new OS image, since VS2012 refuses to even attempt to install on XP and upgrade installs atop XP x64 are impossible anyway.

    EDIT: I suspected that having authentication disabled on R3 might be contributing, but enabling authentication actually magnifies the problem.  On the other hand, running with authentication enabled on RTM again works flawlessly.

    The EchoTester node's log looks like this:

    *   Service started [04/30/2013 13:55:17][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   10000 pings ponged, 00:00:13.1094653 elapsed, 4025232 heap size [04/30/2013 13:55:30][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   20000 pings ponged, 00:00:38.6721353 elapsed, 5197192 heap size [04/30/2013 13:55:56][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   30000 pings ponged, 00:01:21.8442994 elapsed, 6228744 heap size [04/30/2013 13:56:39][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   40000 pings ponged, 00:02:26.7353653 elapsed, 7521080 heap size [04/30/2013 13:57:44][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   50000 pings ponged, 00:03:56.2828390 elapsed, 8562712 heap size [04/30/2013 13:59:14][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   60000 pings ponged, 00:05:59.9711690 elapsed, 9603960 heap size [04/30/2013 14:01:17][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   70000 pings ponged, 00:08:45.5972802 elapsed, 11167376 heap size [04/30/2013 14:04:03][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]
    *   80000 pings ponged, 00:12:08.6455123 elapsed, 12339816 heap size [04/30/2013 14:07:26][http://ws-75430:50000/echotester/5c5e4492-ac3d-4525-a9ab-214c1ac8c880]

    • Edited by jeffrey8192 Wednesday, May 1, 2013 1:00 AM Clean up log formatting
    Tuesday, April 30, 2013 4:41 AM
  • I spent some time looking at this today, and I see the issue even on "2008 RTM" on Window8 even when I force console host to use .NET 3.5 (from the config).  (are you sure you were running intra-node when you saw no leak on 2008 RTM?)

    What I saw was a slow creep in memory...  in 7.5 minutes I saw 10 million ping-pongs and a heap grow by 640K bytes.  I  didn't really notice any increase in time though... it was consistently about 420 milliseconds per 10,000 ping-pongs during that time.  How long did you run before you saw time go up?

    Seems you have stumbled on a real leak.  The next question is... how critical is this for you right now?  You should be able to generate your performance numbers even with this. I am not aware of this specifically interfering with other users. 

    Let me know how XP works for you.

    Tuesday, April 30, 2013 11:53 PM
  • To be certain we are speaking the same language here, by "2008 RTM" I mean CCR/DSS library version 2.0.913.0 with which I cannot reproduce the problem, and by "2008 R3" I mean CCR/DSS library version with which I can reliably reproduce the problem.

    When the problem does not manifest, no progressive slowdown appears, but there is still a slow creep in memory usage for the first 1000 log messages or so as the Console Output service accumulates 1000 log messages in its state; 640 bytes per logged message seems a reasonable amount of overhead for this.  Once the steady state of 1000 recent log messages is reached, then memory utilization plateaus.  When the problem does manifest, the leak is in the TCP transport, and the slowdown is swift and dramatic, as seen in the log snippet above.

    The problem does not manifest in 2.0.913.0 at all, and in it manifests only when running inter-node.  Your time of 420 milliseconds per 10,000 ping-pongs suggests to me that you were running intra-node, where the TCP transport is not involved and the problem does not manifest; my times were 833 milliseconds per 10,000 ping-pongs intra-node, and 5747 milliseconds per 10,000 ping-pongs inter-node.

    The real signature of the problem is that a large number of instances of Receiver<IAsyncResult> in CleanedUp state are accumulating in the TCP transport's ReadFromStream iterator method's local Port<IAsyncResult>'s Store of Receivers; I discovered this using ANTS Memory Profiler and confirmed it in the Visual Studio 2008 debugger.

    It's not a complete showstopper, since the differences between 2.0 and 2.2 appear to be minimal and 2.0 doesn't cause me any trouble, but I do think there's a good chance it's a genuine bug.

    Wednesday, May 1, 2013 12:56 AM
  • Thank you for the clarification and bug report.  Please do let us know if your current workaround for now of using the 2008 RTM CCR/DSS libraries doesn't work for some reason.


    Wednesday, May 1, 2013 5:28 PM
  • With the given clarification, is the problem now reproducible for you?

    I'll let you know if I hit any further snags.  I've reformatted my development box, installed Windows 8, and am presently knee-deep in rebuilding my environment; once it's functional I'll be able to repeat the test on the newer operating system.

    Wednesday, May 1, 2013 10:40 PM
  • I tried creating an Multi-Node application and did not see the rapid heap growth you describe.

    Did you create a "Distributed Application" or set it up some other way?
    Can you share the manifest(s) and procedure?


    Thursday, May 2, 2013 8:16 PM
  • The manifest files are bundled with the test case code I uploaded.  The steps to reproduce are as follows:

    After doing the usual DSS project import dance (run DssProjectMigration, fix proxy assembly reference paths, etc.), building, and making sure HTTP reservations are available for ports 50000 and 50002, launch two DSS Command Prompt (x64) windows.

    In the first window, change to the Echo project source directory, run DssHost /p:50002 /t:50003 /m:Echo.manifest.xml, then wait for the Echo service to start.

    In the second window, change to the EchoTester project source directory, run DssHost /p:50000 /t:50001 /m:EchoTester.manifest.xml, then wait for the EchoTester service to start.  The manifest specifies that Echo resides on the other node.

    Finally, look at http://localhost:50000/console/output and observe the messages logged by EchoTester.

    When built and run against 2.2, I observe progressive slowdown and heap growth; when built and run against 2.0, I see nothing more than the minimal heap growth induced by the Console Output service.

    EDIT: I've also tried building and running the projects against 4.0, same problem.  I also believe I've pinned the issue down to a slight change in Port<T>.RegisterReceiver(ReceiverTask) introduced sometime between 2.0 and 2.2 that prevents removal of more than a single CleanedUp Receiver from the Store at once.

    • Edited by jeffrey8192 Friday, May 3, 2013 7:04 AM more info
    Thursday, May 2, 2013 10:43 PM
  • I have some other issues that are preventing me from running later versions now...  need to clean up my system.

    I'm hoping your workaround of using the 2.0 binaries is adequate for now.

    Would you mind e-mailing me offline at roboinfo at micrososft.com...  I'd like to follow up and get more detail.


    • Marked as answer by jeffrey8192 Friday, May 3, 2013 10:27 PM
    Friday, May 3, 2013 3:31 PM
  • Will do.  Seems we're both dealing with workstation messes at the moment. :)
    Friday, May 3, 2013 10:27 PM