none
volatile, somewhat puzzled (C not C++ question)

    Question

  • Hi

     

    I have used volatile many times on several platforms, but I'm seeing something puzzling.

     

    We have code (our own fast read/write spin lock, coded in C). And although it runs correctly both debug and release builds, I am puzzled over what I see in the debugger memory window when running a release build.

     

    The lock data is declared as a typedef volatile struct ...

     

    in debug, right after this kind of assignment:

     

    struc.field = TRUE

     

    the field is displayed as zero, both by the debugger and when I view the memory window.

     

    Surely the memory window should show this field as a 1?

     

    the precise syntax for using volatile is a little muddy, for example we declare the lock struc (in effect) like this:

     

    typedef volatile struct vmx_spin_lock

    {

    long master_spin_count;

    long reader_spin_count;

    long reader_wait_count;

    long upgrader_spin_count;

    long upgrader_wait_count;

    etc etc etc.....

    DWORD owner_pid;

    DWORD owner_tid;

    BOOL metering_enabled;

    } Vmx_spin_lock, * Vmx_spin_lock_ptr;

     

    Forget the details, but as you can see my assumption here is that "Vmx_spin_lock" is a type that when used, will always declare an instance of a volatile struc, and "Vmx_spin_lock_ptr" will always declare a pointer to a volatile structure.

     

    So my question is, is the use of volatile actually correct OR is what I see in the debugger, some valid side-effect of debugging a release build?

     

    In reality these structures are in shared memory of course, hence the need for volatile.

     

    If I break the debug session arbitrarily, I am always seeing zeros for the various fields, even the "metering_enabled" that has been explicitly set to TRUE !

     

    I fully expect less useful debugger behavior with Release builds, but surely I shd see the memory dump showing me real values?

     

    My concern is that this (and other structures) are NOT actually being trated as volatile and that we may have nasty issues when we begin to stress test release builds.

     

    Any comments or help, much appreciated.

     

    H

    Tuesday, June 12, 2007 11:44 PM

Answers

  • Hi Hugo,

     

    There's nothing wrong with the way you're using volatile. If your instance is a local variable it's probable the optimiser is doing some funky things with that stack that the debugger just can't understand. This is normal, the optimiser blindfolds the debugger somewhat. Smile One thing you can do to confirm this behaviour is to print out the address of your instance and compare it with the value the debugger has. For example:

     

    Code Snippet

    void func(void)

    {

        Vmx_spin_lock lock;

        printf("%p",&lock);

    }

     

     

    If you break the debugger sometime after the printf you will probably see the address the debugger has and the address output by the printf are different.

     

    John.

     

    Wednesday, June 13, 2007 6:11 AM

All replies

  • Hi Hugo,

     

    There's nothing wrong with the way you're using volatile. If your instance is a local variable it's probable the optimiser is doing some funky things with that stack that the debugger just can't understand. This is normal, the optimiser blindfolds the debugger somewhat. Smile One thing you can do to confirm this behaviour is to print out the address of your instance and compare it with the value the debugger has. For example:

     

    Code Snippet

    void func(void)

    {

        Vmx_spin_lock lock;

        printf("%p",&lock);

    }

     

     

    If you break the debugger sometime after the printf you will probably see the address the debugger has and the address output by the printf are different.

     

    John.

     

    Wednesday, June 13, 2007 6:11 AM
  • Fascinating!

     

    I did just that and the pointer that was printed differed by 4 from the value used/seen by the debugger!!

     

    However code to which the pointer is passed seems to see the correct value (the same as was printed).

     

    Anyway, this does clear up my main puzzlement.

     

    May I ask, is the manner in which I am using volatile, totally correct for Microsoft C? (VS 2005)

     

    As I say, I want the struct typedef name to ALWAYS declare a volatile structure, and the struct typedef pointer name to ALWAYS declare  a pointer TO a volatile structure (as opposed to a volatile pointer, if u know what I mean). I assume that Microsoft themselves must rely heavily on this in their OS as must device driver developers, but I do want to ensure I dont use a syntactic variation that may have pitfalls.

     

    My research indicates that this is somewhat compiler vendor dependent.

     

    Thx

     

    Wednesday, June 13, 2007 2:37 PM
  • You shouldn't actually see problems in debug mode. While modern compilers can actually generate quite useful debug information in many cases, VC++ has never been particularly strong at it (even though it seems to have improved significantly in the last two releases).

     

    Here's a little background on how debuggers typically work to display values of local variables:

     

    Conceptually, the compiler emits a table that maps the instruction address to a description on how to retrieve the value of a given variable. If the variable lives on the stack (which is very likely the case in your example since there is an address-taking use) it is typically obtained with an offset from a base pointer: EBP in a standard frame, or (usually) ESP in a FPO frame. Debug builds will always use EBP frames.

     

    In my experience, the problems arise from that fact, that the debug information is not accurate for the prolog and epilog of a function (where EBP is set up). I have never had the motivation to really investigate it, but I suspect that this is simply a weakness in Microsoft's debug information format.

     

    And to answer the other question, volatile is of very limited use for portable code since the standard memory model doesn't say anything about threads. Most modern compilers will probably still require explicit memory barriers for multithreaded code on machines with more than one processor. VC++ does.

     

    -hg

    Wednesday, June 13, 2007 3:40 PM
  • Hi I think this prob is confined to automatic vars.

     

    The test shell for the locking library declares a local "lock" struct and then kicks of a laod of thread that get read/write locks in a semi-random manner.

     

    I modified this so that the lock was declared as static outside of the main function, and the address seen by the debugger matched that displayed by the printf at the start (that I recently added). So this now looks fine, the values in the struc look as expected after the test and during it.

     

    It would be interesting to investigate the issue more, especially since when the struc is declard on the stakc, its address (seen by debugger) was always off by 4.

     

    Not sure what you mean when you write:

     

    "And to answer the other question, volatile is of very limited use for portable code since the standard memory model doesn't say anything about threads. Most modern compilers will probably still require explicit memory barriers for multithreaded code on machines with more than one processor. VC++ does."

     

    I want to be clear that I fully distinguish between volatile and synchronization, I dont consider these to be the same thing.

     

    We use "volatile" to ensure that the compiler/optimizer nevers locally caches (in a register etc) values in the struc, research indicates that as of VS 2005 "volatile" (for C & C++) actually can be used in place of explicit memory barriers (though it may be a tad wasteful for those updates that need not be barrier protected) and no reordering concerns can exist, however this is NOT mentioned anywhere in the MSDN article that documents "volatile". Given that its nature changed subtley in VS 2005 I find this omission a tad sloppy.

     

    We use locking (in fact the read/write recursive spin lock we developed relies soley on a state-based design that uses InterlockedCompareExchange).

     

    Although "volatile" is irrelvant so far as InterlockedCompareExchange is concerned, we do have other fields in the struc and these must always reflect the true values as they are modified in real-time.

     

    In reality this (and other) structures are in shared memory and visible to multiple threads in multiple processes, we need volatile (I need it to work as I expect it to) so that updates to these shared strucs are always made and never 'cached' by optimized code; this design should be fine for both single CPU and SMP machines.

     

    Removing 'volatile' from our lock struc has no effect on its ability to run, becayse the values that control the actaul locking are always manipulated by InterlockedCompareExchange and this (by design) always read/write to memory atomically.

     

    Do speak up however if you think there is something else here that needs to be clarified, I'd hate to miss something critical in this design !!

     

    Thx

    H

     

    Wednesday, June 13, 2007 5:04 PM
  •  Hugo6003 wrote:

    Hi I think this prob is confined to automatic vars.

     

    The test shell for the locking library declares a local "lock" struct and then kicks of a laod of thread that get read/write locks in a semi-random manner.

     

    I modified this so that the lock was declared as static outside of the main function, and the address seen by the debugger matched that displayed by the printf at the start (that I recently added). So this now looks fine, the values in the struc look as expected after the test and during it.

     

    It would be interesting to investigate the issue more, especially since when the struc is declard on the stakc, its address (seen by debugger) was always off by 4.

     

     

    You can probably use the dia2dump sample to dump the virtual unwinding tables and variable location lists. However, so long you are in the function directly (that is not in a callee frame below), no virtual unwinding is required.

     

     Hugo6003 wrote:

    Not sure what you mean when you write:

     

    "And to answer the other question, volatile is of very limited use for portable code since the standard memory model doesn't say anything about threads. Most modern compilers will probably still require explicit memory barriers for multithreaded code on machines with more than one processor. VC++ does."

     

    I want to be clear that I fully distinguish between volatile and synchronization, I dont consider these to be the same thing.

     

    We use "volatile" to ensure that the compiler/optimizer nevers locally caches (in a register etc) values in the struc.

     

     

    That's the idea of volatile, even though the standard is very vague. Basically it ensures that things like redundant-load-after-store or multiple writes to single location are not combined. Consequently, other optimizations aren't performed is some cases (e.g. aliasing and global-points-to analysis).

     

     Hugo6003 wrote:

    We use locking (in fact the read/write recursive spin lock we developed relies soley on a state-based design that uses InterlockedCompareExchange).

     

    Although "volatile" is irrelvant so far as InterlockedCompareExchange is concerned, we do have other fields in the struc and these must always reflect the true values as they are modified in real-time.

     

     

    Ok, here's where I am not certain that you got everything right. While some argue volatile is not important I'd disagree (even though the argument is rather academic). In a double-checked lock scenario for one-time init volatile read is good enough, i.e. no acquire semantics are required.

     

     Hugo6003 wrote:

    In reality this (and other) structures are in shared memory and visible to multiple threads in multiple processes, we need volatile (I need it to work as I expect it to) so that updates to these shared strucs are always made and never 'cached' by optimized code; this design should be fine for both single CPU and SMP machines.

     

    Specifically, caching happens not only due to clever code generation, but in the CPU caches. Therefore you need certain things for synchronization:

     

    • instruction-level cache locking (lock prefix in x86) ensures that micro-instructions do not read outdated values
    • synchronize caches
    • prevent memory reordering

    The interlocked primitives will take of all of that for you. Volatile won't.

     

    The question is whether these shared strucs are written and read correctly. Just using volatile reads and writes without any memory barriers (InterlockedXX again implies a memory barrier) does mean any of the above points and you could see

    • incorrect results (e.g. add dword ptr [address] , 1 is _not_ atomic with multiple CPUs)
    • might see cache artifacts (e.g. mov eax, dword ptr [address] load eax from cache, cache can be outdated if written to from another CPU)
    • memory reordering issues, e.g. CPU#0 updates a,b CPU#1 reads updated a but old b or vice versa - need a memory barrier

    So with fully cached memory volatile is much less useful. It really only protects against interfering operation from the same CPU. This might be useful for memory-mapped I/O (e.g. as is commonly used for ARM-architectures), is of little use for "normal" memory.

     

    -hg

    Wednesday, June 13, 2007 6:17 PM
  • Thanks for taking the time to elaborate, it's appreciated.

     

    Im seeing several artciles that claim, that as of VS 2005, "volatile" now also provides (intrinsically) read/write memory barrier semantics.

     

    http://msdn2.microsoft.com/en-us/library/ms684208.aspx

     

    If I understand this correctly, it means that one (On Windows that is using the VS 2005 compiler) may simply use volatile, and be confident that they are also getting the safety of memory barriers in addition to the normal safety gains that "ordinary" volatile provides.

     

    This new enhanced "volatile" though is less beneficial than explicit barrier statements because the latter allow the coder to select when and where to locate the operations that do carry a cost.

     

    I'd be interested to hear from anyone that thinks volatile (the "new" version in VS 2005) is NOT sufficient for the kind of stuff being discussed here.

     

    Thx

    Hugh

     

    Wednesday, June 13, 2007 10:23 PM
  • I don't think you read that correctly. The compiler will not reorder volatile reads and writes. However, that doesn't mean that the CPU won't. To prevent the CPU from reordering memory accesses you need a memory fence. I do not think there is anything new about volatile with VC++2005. If you see it in the docs, please post where.

     

    So if you have something like

    void foo() {

    static int a = 0;

    static int b = 0;

    ++++a; ++++b; // compiler is free to go as many times to memory as it wants

    // on x86 this will likely be

    // mov eax,2

    // add dword ptr [foo::a],eax

    // add dword ptr [foo::b],eax

    }

     

    void bar() {

    static volatile int a = 0;

    static volatile int b = 0;

    ++++a; b++; // compiler will emit code to write to a twice and will then write to b twice

    // on x86 this will probably look like

    // mov eax,1

    // add dword ptr [bar::a],eax

    // add dword ptr [bar::a],eax

    // add dword ptr [bar::b],eax

    // add dword ptr [bar::b],eax

    // note that the CPU is still free to reorder writes - e.g. both on different cache lines it may be b is written back to up-level cache or main memory earlier than a

    }

     

    void bar() {

    static volatile long a = 0;

    static volatile long b = 0;

    _InterlockedIncrement(&a); // lock xadd

    _InterlockedIncrement(&b); //

    // on x86 this should look like

    // mov edx, OFFSET bar::a

    // mov eax,1

    // mov ecx, OFFSET bar::b

    // lock xadd dword ptr [edx],eax - lock works as a full memory barrier on x86

    // lock xadd dword ptr [ecx],eax - lock works as a full memory barrier on x86

    }

     

    Locking cache lines is quite expensive and I'm fairly certain normal volatile reads and writes won't do it.

     

    Also don't blame me on x86 code. I'm not really up-to-date on how expensive these operations are. Chances are the compiler will schedule the instructions differently.

     

    -hg

    Wednesday, June 13, 2007 11:53 PM
  • Hi

     

    I understand that example, and I agree with your reasoning.

     

    As for VS 2005 and volatile, here is the key snippet from the artilce I referred to above:

     

    "With Visual Studio 2003, volatile to volatile references are ordered; the compiler will not re-order volatile variable access. With Visual Studio 2005, the compiler also uses acquire semantics for read operations on volatile variables and release semantics for write operations on volatile variables (when supported by the CPU)." (Italics for emphasis)

     

    Now the dissapointing thing about this, is that the MSDN artciles that define "volatile" have NOT been updated to reflect this, which is not helpful in the least.

     

    This article also sheds light on this issue:

     

    http://msdn2.microsoft.com/en-us/library/ms686355.aspx

     

    (Just search the page for "2005").

     

    Also this snippet

     

    "Visual C++ 2005 goes beyond standard C++ to define multi-threading-friendly semantics for volatile variable access. Starting with Visual C++ 2005, reads from volatile variables are defined to have read-acquire semantics, and writes to volatile variables are defined to have write-release semantics. This means that the compiler will not rearrange any reads and writes past them, and on Windows it will ensure that the CPU does not do so either." 

     

    My only gripe about this statement, is that the part about "compiler will not rearrange " is true of VS 2003, not just VS 2005; only the latter part about CPU ordering is specific to VS 2005.

     

    Taken from this very well written article: (Explaining this whole issue in depth and includes xbox 360)

     

    http://msdn2.microsoft.com/en-us/library/bb310595.aspx

     

    I can"t find much more.

     

    Regards

    Hugh

     

    Thursday, June 14, 2007 12:03 PM
  • It seems the documentation is simply wrong. The generated code for IA-32 (and presumably for x64, too) does not have acquire or release semantics for volatile reads and writes.

     

    I don't know the story XBOX, but recent Pentiums do support read and write memory barriers (even though the VC still targets CPUs which don't). Anyway, I'd say the documents are misleading at best and I still claim volatile isn't good enough.

     

    -hg

    Thursday, June 14, 2007 12:40 PM
  • OK

     

    I am going to raise a new question in the C++ general group, we may need an MS guru to fully explain all this.

     

    Thx

    Hugh

     

    Thursday, June 14, 2007 2:50 PM
  • I will also check the generated x64 assembly code, it seems that x86 really has no probs and memory barriers are ultimately noops.

     

    I'm examing the x64 instruction set now, I'm seeing three instructions that directly bear on this issue:

     

    LFENCE (for loads)

    SFENCE (for stores)

    MFENCE (for both)

     

    also there is:

     

    CLFLUSH (write cached line(s) to main storage, no doubt carries a significant cost)

     

    It seems that presence (i.e support) of these instructions is processor model dependent, and is indicated by specific bits within the chip's CPUID data. I guess that those models that are not intended for multi-processor designs do not support these instructions, it seems that an invalid opcode exception is raised should this be attempted.

     

    I will look at some actual code later, right now I am away from office.

     

     

     

     

     

    Hugh

     

    Thursday, June 14, 2007 4:45 PM
  • These exists for x86 as well (I think these are formally part of SSE2 or SSE3 ISA). And as I have said before lock prefix work as a full memory barrier (in addtion to making the instruction atomic).

     

    I very much doubt that memory barriers are noops on newer Pentiums as these have some of the most sophisticated cache architectures and speculation technology.

     

    So just for files I'm still not convinced and claim that on recent x86 CPUs volatile reads and writes will lead to all kinds of problems.

     

    -hg

    Thursday, June 14, 2007 5:47 PM
  • Thanks Holger, I have changed the topic a little and resumed it here:

     

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1733909&SiteID=1

     

    Since my question was initially about C syntax, I think it best that I start a new thread, the question is now all about memory barries and volatile, I am seeing evidence that you may be correct, it's all very puzzling!

     

    Thx

    Hugh

     

    Thursday, June 14, 2007 7:06 PM