locked
Share stories of CHESS wins/losses RRS feed

  • General discussion

  • Are you using CHESS? Do you have success or horror stories to share? We would love to hear about how CHESS works in the field. Use this thread to share your experiences about CHESS.

    Thanks, the CHESS team. 
    Thursday, January 15, 2009 8:57 PM

All replies

  •  It works!

    We have a heavily multithreaded distributed system to work on. Its server part has a lower C based level (3-rd party code communicating with remote devices) wrapped with managed/unmanaged mix of C++ and topped with a pile of C#. No wonder, sometimes (once per month) we see a "memory violation exception" message, announcing a typical heisenbug, and it is not reproducable...

    It took some time to have everything installed and a suspicious layer wrapped into a sutable test, and now it has just started reporting problems it found. So far so good. We didn't find the major flaw, yet, but it has detected a livelock problem, which is worth fixing.

    An amazing part of it is how deeply we rely on the schedulers, in hope that they take care of the things, while they do not! CHESS is a "must-have" tool. Thank you guys for the outstanding job!
    Slava Luchianov
    Monday, February 9, 2009 8:38 PM
  • Hi Slava,

    It's great to hear that CHESS is starting to produce results for you!  Let us know if we can help you further. Best,

    -- Tom
    Wednesday, February 18, 2009 4:38 AM
  • I've only been using Chess for a couple of days, and already I have refactored several libraries with respect to locking and threading. 

    The system I am working on is massively multithreaded, and a HUGE pain to test, due to all of the concurrent programming (not two programmers programming at the same time, lol).  Our system is designed to host upwards of 50,000 long lived socket connections, sending and processing data concurrently on all sockets.  The system has been in design and development for the last 3 years, so it is pretty tight as far as performance and lack of bugs go. 

    However, the hardest bugs to find are the ones that crop up only once in a while, and then cannot be reproduced.  Chess has made it possible to find, reproduce and fix more of these errors than I have previously.  Even in the two days I have been using it I have made great strides in rooting out some of the more frustrating one-off bugs that I've seen in the past.

    So, thanks guys!  I hope this project gets enough resources from Microsoft, and that one day it's included as a standard test tool in Visual Studio.
    Thursday, October 15, 2009 8:55 PM
  • In one word: AWESOME!

    For people starting out with CHESS: I recommend reading this paper as well to improve your understanding of how CHESS works internally:
    http://research.microsoft.com/en-us/projects/chess/osdi2008-chess.pdf. Next to that, the example in Chapter 2 emphasizes how useful CHESS really is: addressing non-determinism in multithreading.

    Congrats & thanks!
    Thursday, November 12, 2009 8:32 AM
  • I can't say I've really had much success so far, though it's exactly the kind of tool we need.

    I've been (to date) trying to use Chess as the MSTest test host, and have been banging my head against:

    * Tests that fail due to Chess not supporting something (like ReaderWriterLock, fair enough), but you have absolutely no clue why unless you step through and see the 'NotImplementedException' that Chess is throwing, since even in repro mode you just get shown that the test failed with no real additional information.

    * Tests that fail for other reasons, and again Chess just says 'test run failed', even when running in repro mode, and it's only when you take chess off the test and run it normally you realise the 'normal' execution path is broken and you actually get a stack trace to track down the error. Surely that could go in the test output even when running in Chess? 

    * A string of wierd Chess exceptions, like not being able to marshall IntPtrs between AppDomains, or handles being closed.

    So pretty frustrating so far actually. Seems like any time code is doing anything interesting (using a Timer, or a ThreadPool.WaitForSingleObject with a timeout) I can't get it to work.

    (I'll try and post some specific issues seperately: this was just my overall reaction)

    Tuesday, March 30, 2010 3:50 PM
  • Thanks. These are all very good points. Regarding MSTest host: we will not be supporting the MSTest host integration any longer because of various issues. So it's best to just use mchess directly.

    Regarding multiple AppDomains, the underlying instrumentation framework we are using doesn't support this well.

    -- Tom

    Saturday, April 10, 2010 5:40 PM
  • I implemented an intrusive single-linked list in C#, in unsafe code, using CAS to set the tail pointer. The algorithm looked perfect to me and it's a relatively simple thing to do.

    But it didn't work, and it failed unpredictably. I could not find the flaw in my logic, yet I knew it had to be there. So I downloaded CHESS and spent the day playing with it. The GUI ChessBoard needs some polishing, I struggled for hours with various things (don't have a space in the project path) and eventually gave up on it. But the command line mchess works quite well, and so does ConcurrencyExplorer II. It found an interleaving that caused a failed test, and realiably reproduced it. Then I used trace and the ConcurrencyExplorer to look at what was happening around the context switches.

    This allowed me to see an Interlocked.CompareExchange that by my reasoning should have failed, succeed. It turns out that in an intrusive list where nodes can be re-used, it can happen that the list is modified by another thread, but the CAS can still see the same tail node (because it was removed, and added back before the thread resumes.) So there was my logic error, thanks to CHESS, in about 30 minutes when I finally figured out how to use it. Sure beat the hours of frustrated head scratching that was my first attempt to find the problem.

    I would love to see a CHESS release for .NET 4. I wouldn't want to debug concurrency issues without it.

    -Dan

    Tuesday, April 13, 2010 6:03 AM
  • ok, so since I stopped using the testhost and started using mchess (and RTFM'd about /includeassembly) I am getting much more milage, and have actually found and fixed some serious issues. Horray!

    But as to the app domains: we're not running multiple appdomains: it's either chess or VS that introduced that! I've not seen the IntPtr one since I gave up on the integrated test host, but I've seen at least one 'marshalling GCHandle across appdomains' come up (when an assert in a test lead me to break into the debugger directly)

    Thursday, April 15, 2010 12:50 AM
  • Very cool, Dan!  We are working on adding support for the TPL (specifically, the Task abstraction). 

    -- Tom

    Thursday, April 15, 2010 5:43 AM
  • I am glad to hear you are getting some mileage out of mchess (sort about the /includeassembly - we do have a plan to eliminate the need for that).

    Regarding the 'marshalling ...' error I think that's a bug of ours... I guess we're really going to have to do another release... :)

    -- Tom

    Thursday, April 15, 2010 5:47 AM
  • Sad to hear that you'll be dropping the MSTest host - this effectively means that we won't be able to automatically run CHESS as part of our regular automated unit testing without jumping through quite a lot of hoops. We currently use NUnit and while migrating to MSTest (and Chess) would be straightforward, moving to mchess wouldn't be as we'd need a framework for specify which tests to run from the static Run() method.

    In practice this means that MChess will be used only for troubleshooting where we know problems exist rather as a matter of course. The corollary of that is that it will get used less, so we will not develop expertise with it - and it will then fall into disuse altogether.

     

    Wednesday, June 23, 2010 8:19 AM
  • I put together a PowerShell script to jump though those hoops, and it is a massive pain in the rear. Reflection to find test methods, a convention that the first argument passed to Run is the name of the test to run, the pain of having to cut-and-paste that static Run method everywhere. It sucks.

    That being said, I can't imagine wanting to run chess across *all* my unit tests as part of the automated build. Without in depth knowlege of which assemblies to instrument and so on, the tests would likely fail anyway, and unless kept really discrete running the test run could take forever.

    I think it's more likely that *some* unit tests would want to be run under chess as well, which then comes down to having an extension in your CI build/test process that runs chess tests also. I do it on the basis of running (under chess) all the test methods in a particular set of test classes (the set is basically a list in the powershell script that controls the chess test run).

    Incidentally (directed at Tom) if the Setup/Run/Cleanup methods didn't have to be static, life would be infinitely easier here.

    We've not automated it as part of our CI build because we can never get all the tests to pass, and stare at chessboard as much I might I still think it cries 'deadlock' when there are none.

    Thursday, June 24, 2010 1:48 PM