none
Memory Sharing Between Threads on Singlecore vs. Multicore Machine RRS feed

  • Question

  • I originally posted this thread in C# language, but it didn't get much love.  Maybe this forum would be more appropriate given its low level nature.


    Consider the following C# program:

    public class Program
    {
         private int fairness = 0;
    
         private void Add()
         {
            while(true)
            {
                 fairness++;
            }
         }
    
         private void Subtract()
         {
            while(true)
            {
                fairness--;
            }
         }
    
         public static void Main()
         {
            // launch two threads, one calling Add, the other Subtract.
            // After running for 1 second, terminate the threads.
     
            Console.Writeline(fairness);
         }
    
    }
    


    Assume then that we execute this program on a single core machine.  During the course of execution the CLR will allocate some amount of time to the "Add Thread" and some amount of time to the "Subtract Thread."  Lets say that on a given execution this works out to 10 loops of the while block for Add and 8 loops of the while block for Subtract.  While we don't know the order in which the modifications to the field fairness will occur, I believe we can be assured that at the end of the program 2 will be shown to the console for this example execution.

    My question then is how does this world change when the program is executed on a multi core machine?  What if each thread is sent to separate cores A and B?  I would think that the variable fariness then could exist separately for each thread once in a register of A and once in a register of B.  That way at the end of execution I believe the registers of cores A and B could look like:

    A: fairness = 10
    B: fairness = -8;

    When these registers are then flushed back to main memory it seems to me that either of these values could end up occupying the main memory address for fairness and as such the output of our program could be 10 or -8.

    Is this correct reasoning?
    • Edited by DejasPer Tuesday, June 30, 2009 4:22 AM code formatting
    Tuesday, June 30, 2009 4:19 AM

Answers

  • Part of the confusion comes because the official CLR spec would allow multicores to have different cached values and operate on them independently.

    However, Microsoft's implementation of the CLR used the x86 memory model, which has stricter memory guarantees. When they tried to port it to IA-64, they attempted to use an EMCA-compatible, looser memory model that was more suited for the platform, but so much code broke that they ended up tightening the memory model again. e.g., double-checked locking in particular is broken on IA-64 (Google the "Double-Checked Locking Is Broken" declaration), but it happens to work on x86, so it ended up being used a lot when it really shouldn't have been.

    More info:
      http://blogs.msdn.com/jaredpar/archive/2008/01/17/clr-memory-model.aspx
    (be sure to read the MSDN article referenced from this blog entry).

    In short:
    A) According to the EMCA spec, the problem you describe above is a real problem, and could manifest.
    B) Every existing implementation works around this problem. The end result is a loss of efficiency, but less surprising behavior. There is now such an amount of existing code that depends on the "Microsoft memory model" instead of the "EMCA memory model" that it is incredibly unlikely that any future CLR will ever be able to use the EMCA memory model.

           -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    Tuesday, June 30, 2009 1:45 PM

All replies

  • Your mental model of how threads and CPUs work is not close to reality.  I'd recommend you actually write and run this code to get a feel for how it really works.

    Hans Passant.
    Tuesday, June 30, 2009 4:36 AM
    Moderator
  • Part of the confusion comes because the official CLR spec would allow multicores to have different cached values and operate on them independently.

    However, Microsoft's implementation of the CLR used the x86 memory model, which has stricter memory guarantees. When they tried to port it to IA-64, they attempted to use an EMCA-compatible, looser memory model that was more suited for the platform, but so much code broke that they ended up tightening the memory model again. e.g., double-checked locking in particular is broken on IA-64 (Google the "Double-Checked Locking Is Broken" declaration), but it happens to work on x86, so it ended up being used a lot when it really shouldn't have been.

    More info:
      http://blogs.msdn.com/jaredpar/archive/2008/01/17/clr-memory-model.aspx
    (be sure to read the MSDN article referenced from this blog entry).

    In short:
    A) According to the EMCA spec, the problem you describe above is a real problem, and could manifest.
    B) Every existing implementation works around this problem. The end result is a loss of efficiency, but less surprising behavior. There is now such an amount of existing code that depends on the "Microsoft memory model" instead of the "EMCA memory model" that it is incredibly unlikely that any future CLR will ever be able to use the EMCA memory model.

           -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    Tuesday, June 30, 2009 1:45 PM
  • The easy answer (in case you didn't catch it in Karel's post) is: ALWAYS use thread synchronization primitives to enforce correct order of reads and writes between thread on a multicore processor. It's virtually impossible to predict the results if you don't. As Karel's excellent references indicate, there are a small handful of cases in which one can avoid synchronization primitives on the reader side, and none on which one can avoid synchronization primitives on the writer side; and even these cases rely on the characteristics of very select algorithms, or very select coding styles, that make them tolerant of errors in read/write ordering, when read/write ordering problems do occur.

    And bugs caused by incorrect synchronization are the worst possible kind of bug: the one that occurs on a random bases once in every couple o million times through the code; and the kind that never occurs when you actually have a debugger in play.






    • Edited by nobugzModerator Wednesday, July 1, 2009 6:56 PM flame bait removed
    Wednesday, July 1, 2009 6:02 PM