none
Is a half-written values reading prevented while SELECT-ing WITH (NOLOCK) hint?

    Question

  • Is it possible to select half-written values WITH( NOLOCK) values and if not how is it (half-written values reading) prevented (if no locks are respected)?

    Violation of which DBMS principle is reading of half-written value?
    I am having difficulties in identifying its term (is it consistence, integrity break)?
    What is the name of corresponding term?

    • Edited by vgv8 Saturday, November 27, 2010 6:59 PM
    Friday, November 26, 2010 2:51 AM

Answers

  • As far as I know, SQL Server has some mechanism that prevents reading incompletely written pages, even if NOLOCK is used. Therefore, if a value is stored in a single page (as should be the case with all data types except varchar(max), nvarchar(max), varbinary(max) and xml), then a "half-written value" should not be seen. However, for the nvarchar(max) it seems to be possible to read such values (only if NOLOCK is used). For example, run the following script in a window:

    CREATE TABLE Test (
    	ID int IDENTITY PRIMARY KEY,
    	Txt nvarchar(max) NOT NULL
    )
    GO
    INSERT INTO Test
    SELECT REPLICATE(CONVERT(nvarchar(max),CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
    GO 10
    UPDATE Test SET Txt=REPLICATE(CONVERT(nvarchar(max),CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
    GO 1000
    

    Then run the following query in another window:

    WHILE 1=1 BEGIN
     SELECT Txt FROM Test WITH (NOLOCK) WHERE LEN(REPLACE(Txt,LEFT(Txt,1),''))<>0
     IF @@ROWCOUNT<>0 BREAK
    END
    
    With the NOLOCK hint and the UPDATE query running at the same time, you get some results pretty quickly. When the NOLOCK hint is removed, there are no rows returned by the query.

    Razvan

    • Marked as answer by vgv8 Saturday, November 27, 2010 6:34 PM
    Friday, November 26, 2010 9:44 PM

All replies

  • There is no such term or mechanism of reading/writing half-written values. Either they are written/read full or none, this is integrity & consistency. Database follows the ACID rule.

    I think you are talking about DIRTY READS when using WITH (NOLOCK) hint.


    ~Manu
    http://sqlwithmanoj.wordpress.com
    • Proposed as answer by Seth Lynch Friday, November 26, 2010 4:24 PM
    • Unproposed as answer by vgv8 Saturday, November 27, 2010 6:35 AM
    Friday, November 26, 2010 3:31 AM
  • I think it's called 'reading dirty data' (dirty reads). I'm not sure there is a better term describing the problem.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Friday, November 26, 2010 3:31 AM
  • There are a number of types but the one you are describing are dirty reads as already suggested. There are four main types...

    - Dirty reads

    - Non-repeatable reads

    - Phantom reads

    - Serialisable vs snapshot

    These are described in the following set of articles by Michael Swart...

    http://michaeljswart.com/?p=604

     

    Thanks


    /Neil Moorthy - Senior SQL Server DBA/Developer (MCITP (2005), MCAD, OCA 11g) Please click the Mark as Answer button if a post solves your problem
    Friday, November 26, 2010 4:02 AM
  • These all phenomena do not tell anything about completeness/incompleteness of a value overwriting/deleting/updating/inserting used, only about transaction isolation.

    Well, I only can find that dirty read is read of uncommitted data by another transaction. I could never find anything telling that dirty read include garbage reading, i.e. incompletely overwritten values.

    Should I acept this answer (that dirty reads are reading incompletely overwritten  values) as viable then I should immediately accept that it is possible that the same transaction in the scope of one transaction can possible use the garbage, i.e. incompletely written values.

    Also, I staged a few lengthy loaded tests with big size values and did not observe a single partly overwritten value, only uncommitted or committed ones but not partly overwritten.

    After some contemplation and experimenting, I believe, that DBMS-es do not provide such value integrity:

    • Had we assumed such possibility, then we immediately come to conclusion that the same transaction in the same transaction scope can use incompletely defined (written, inserted, updated, deleted) values what really never happens;
    • Values are used not only by DBMS-es but outside of them by operating system and other software frameworks

    It is most probably the transaction features implemented at operating system, or even more low (machine, hardware) level

    Friday, November 26, 2010 5:58 AM
  • I'm not sure what you mean by "garbage reading".  I assume you're speaking of something like the following:

    Original value of MyColumn: "ABCDEFG".

    Statement: UPDATE MyTable SET MyColumn = '01234567'

    Read of value: "0123EFG" (old value partially overwritten by new).

    If so, then no, you cannot see such values by using the NOLOCK/Read uncommitted feature of SQL Server.  All this means is that you will see values which have not yet been committed to the database, i.e. the read will not issue shared locks, nor will it honor concurrency locks.   However, there's still a lower level of locking implemented, which prevents the sort of scenario you're describing.

     


    Michael Asher
    Friday, November 26, 2010 8:23 AM
  • I'm not sure what you mean by "garbage reading".  I assume you're speaking of something like the following:

    Original value of MyColumn: "ABCDEFG".

    Statement: UPDATE MyTable SET MyColumn = '01234567'

    or something like 2 GB of varchar(max) string. I'd rather not reproduce it in this post
    Friday, November 26, 2010 4:14 PM
  • When data is updated it is not over written - the original row is marked as deleted and a new row is inserted.

    If you are getting wierd stuff out it is because weird stuff is going in to the table. It is not because you are gettign data half-written to.

    Perhaps there was some loss of connection during the upload/insert.

    Seth


    http://lqqsql.wordpress.com
    Friday, November 26, 2010 4:28 PM
  • As far as I know, SQL Server has some mechanism that prevents reading incompletely written pages, even if NOLOCK is used. Therefore, if a value is stored in a single page (as should be the case with all data types except varchar(max), nvarchar(max), varbinary(max) and xml), then a "half-written value" should not be seen. However, for the nvarchar(max) it seems to be possible to read such values (only if NOLOCK is used). For example, run the following script in a window:

    CREATE TABLE Test (
    	ID int IDENTITY PRIMARY KEY,
    	Txt nvarchar(max) NOT NULL
    )
    GO
    INSERT INTO Test
    SELECT REPLICATE(CONVERT(nvarchar(max),CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
    GO 10
    UPDATE Test SET Txt=REPLICATE(CONVERT(nvarchar(max),CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
    GO 1000
    

    Then run the following query in another window:

    WHILE 1=1 BEGIN
     SELECT Txt FROM Test WITH (NOLOCK) WHERE LEN(REPLACE(Txt,LEFT(Txt,1),''))<>0
     IF @@ROWCOUNT<>0 BREAK
    END
    
    With the NOLOCK hint and the UPDATE query running at the same time, you get some results pretty quickly. When the NOLOCK hint is removed, there are no rows returned by the query.

    Razvan

    • Marked as answer by vgv8 Saturday, November 27, 2010 6:34 PM
    Friday, November 26, 2010 9:44 PM
  • As far as I know, SQL Server has some mechanism that prevents reading incompletely written pages, even if NOLOCK is used. Therefore, if a value is stored in a single page (as should be the case with all data types except varchar(max), nvarchar(max), varbinary(max) and xml), then a "half-written value" should not be seen. However, for the nvarchar(max) it seems to be possible to read such values (only if NOLOCK is used). For example, run the following script in a window:

    Really cool and elegant!

    This looks to me like a bug in SQL Server.
    How can operations with rows be atomic if operations with values are not?  

    I would appreciate if you post any links on mechanisms preventing partly-modified pages be read. Is it a page or extent?

    Also, if SQL Server does not prevent reading incompletely LOB-s (large objects) from other transactions, what would prevent such a slip inisde the same transaction/scope/block in parallel processing?


    Update:
    I had problems with WindowsLive which blocked me (my account) from this forum for the last 12 hours.
    I posted your code in  
    http://stackoverflow.com/questions/4278884/is-a-half-written-values-reading-prevented-when-select-with-nolock-hint/4283096#4283096
    with reference to your answer. I hope you do not mind

    Saturday, November 27, 2010 6:50 PM
  • This looks to me like a bug in SQL Server.
    How can operations with rows be atomic if operations with values are
    not?  

    Why would this be a bug? NOLOCK means that. READ UNCOMMITTED.

    I would appreciate if you post any links on mechanisms preventing partly-modified pages be read. Is it a page or extent?

    Besides locks SQL Server also employ latches, which are held for a shorter duration, and they are not held until the end of the transaction. While a page is being updated, the updater holds a latch. It's not a good thing if two processes tries to write to the same page simultaneously. Least not if both think they need to split the page...

    But there no reason to protect data that is stored out of row by the same means, so for LOB data you could read a mix of old and new data.

    Also, if SQL Server does not prevent reading incompletely LOB-s (large objects) from other transactions, what would prevent such a slip inisde the same transaction/scope/block in parallel processing?

    Not sure what you mean here. Could you clarify?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Saturday, November 27, 2010 8:32 PM
  • This looks to me like a bug in SQL Server. How can operations with rows be atomic if operations with values are not?

    It's not a bug. The entire point of the NOLOCK hint is to tell SQL Server you prefer performance to atomic transactions. You won't get partially-written pages even with NOLOCK, because SQL Server completes a page write before marking it available. However, in cases where writes span multiple pages, you will see behavior of this sort. It's not a bug-- it's what you've told SQL Server to do: give you pages as soon as they're written, not as soon as they're committed.

    Before you protest too much over this, remember that any isolation level short of SERIALIZABLE will generate read anomalies of some sort ... and very, very few production systems run at that level. The vast majority of real world systems make some sort of concession between consistency and performance/concurrency. It's our job as developers to know the benefits and drawbacks of each level, and strike the proper balance for the given situation.

    By the way, the NOLOCK hint can cause many more anomaloies than just the one described above. Many people don't realize that, with NOLOCK, you can not only see data that hasn't been committed, but previously committed rows can temporarily disappear ... even though those rows are not being deleted or even modified by any currently running query.


    Michael Asher
    • Proposed as answer by Naomi NModerator Sunday, November 28, 2010 12:06 AM
    • Unproposed as answer by vgv8 Sunday, November 28, 2010 5:11 AM
    Saturday, November 27, 2010 8:41 PM
  • It's not a bug. The entire point of the NOLOCK hint is to tell SQL Server you prefer performance to atomic transactions. You won't get partially-written pages even with NOLOCK, because SQL Server completes a page write before marking it available. However, in cases where writes span multiple pages, you will see behavior of this sort. It's not a bug-- it's what you've told SQL Server to do: give you pages as soon as they're written, not as soon as they're committed

     

    NOLOCK does not mean to give pages as soon as they'are written. It does not mean that reading is waiting in queue when writing would start to modify. READ UNCOMMITTED return committed data if they were not yet changed

    If it is not bug, then it is bug in documentation because  I have never seen any warning on possibilities of reading incompletely modified values.
    Can you give me any links on this in docs?

    If that what I am asking, then why I do not get what I am asking while reading non-LOB values?
    Are you serious in asserting that someone is telling to SQL Server to return partly-modified values?

    Sunday, November 28, 2010 5:07 AM
  • NOLOCK does not mean to give pages as soon as they'are written

    As spelled out in painstaking detail in BOL, NOLOCK means to not honor EXCLUSIVE locks. And what does that mean? It means you read the most current version of each page, whether or not that page has been marked committed. Which means to "give pages as soon as they're written".

    The only mechanism in SQL Server to force one transaction to wait for another is via locking. If a transaction doesn't issue or honor locks, it doesn't wait for a write to complete. Period.

     

    READ UNCOMMITTED return committed data if they were not yet changed

    I can't parse this. READ UNCOMMITTED is identical to NOLOCK; it's simply a less granular method to specify the same behaviour.

     

    If it is not bug, then it is bug in documentation because I have never seen any warning on possibilities of reading incompletely modified values. Can you give me any links on this in docs?

    From BOL: "When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction."

    The fact that you can't see partial writes except across page boundaries is what you might call an undocumented feature of SQL Server. According to the documentation, uncommitted data is by definition an inconsistent view of a row. You seem to believe that uncommitted data is data that "just hasn't been committed yet". That is not true. It is data that may or may not ever be committed, and when reading such data, you may in fact read entire rows that have never existed in the database, and never will.

     

    why I do not get what I am asking while reading non-LOB values?

    Because rows do not span pages in SQL Server.

     

    Are you serious in asserting that someone is telling to SQL Server to return partly-modified values?

    Yes.  That's the very definition of NOLOCK, in fact.  To read a partly-modified transaction.  

     


    Michael Asher
    Sunday, November 28, 2010 6:05 AM
  • READ UNCOMMITTED return committed data if they were not yet changed

    I can't parse this.


    Using the table Test created by script of Razvan Socol above,
    execute in one SSMS session (window):

    1) 

    Begin tran
    WAITFOR DELAY '00:00:30';
    UPDATE Test SET Txt='11'
    --rollback
    
    

    and in second session (SSMS window)
    2)
    SELECT
    Txt FROM Test WITH (NOLOCK)

    would see first committed data and in 30 seconds uncommited ones

    Sunday, November 28, 2010 6:30 AM
  • NOLOCK does not mean to give pages as soon as they'are written.

    It means that exlusive locks are not honored. Latches are honored. Sch-M locks are honored (I believe), and you take an Sch-S lock yourself. This means that once a another process has updated a page, and released the latch, the page may be read.

    If it is not bug, then it is bug in documentation because  I have never seen any warning on possibilities of reading incompletely modified values.
    Can you give me any links on this in docs?

    Everything cannot be spelled out in the docs. Furthermore, exactly what can happen is implementation-dependent, and could change with a new release or even a service pack.

    All the following bad things can happen when you use NOLOCK:

    1) You read uncommitted, and possibly inconsistent, data.
    2) You read partially updated values.
    3) You fail to read data that matches your query and which was committed before you started to run the query.

    In short: all bets are off, and you should be very careful with using NOLOCK or READ UNCOMMITTED. If you want to avoid blocking, use any of the snapshot isolation levels.

    If that what I am asking, then why I do not get what I am asking while reading non-LOB values?

    You mean, why you don't get partly updated values in non-LOB columns? Did anyone tell you that could never happen? As I said, all bets are off. In SQL 2005 and later, rows can spill over to several pages, even if there are no LOB columns. If memory serves, individual columns are still on a single page, but that could change in some later release.

    Are you serious in asserting that someone is telling to SQL Server to return partly-modified values?

    Razvan just did, didn't he?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Sunday, November 28, 2010 10:54 AM