locked
Snapshot Isolation - TempDB impacts RRS feed

  • Question

  • Hello all,

    So I was advised to use snapshot isolation (RCSI) instead of NOLOCKs. The thing is, RCSI uses tempdb to store the data. My question is, how much data ? For instance one of our database is .5 TB. And one of the table in that database which is extensively used is .3 TB.

    So in this context, is it really worth it to go for RCSI when it is going to increase I/Os by a huge amount as well as the space used for tempdb ? In comparison to using nolocks / read uncommitted ?

    Best regards,

    Olivier

    Monday, January 5, 2015 6:57 PM

Answers

  • TempDb is used for the row version store. A row version is stored after an UPDATE or DELETE* of a row, and the row versions are retained while any session needs to read that version.  After which row versions are cleaned up in a background process.

    The size of the database does not affect the size of the version store, only the number of rows you UPDATE or DELETE.

    The only way to tell if it has a noticeable impact on your performance is to test.  Typically the overhead of row versioning (which also includes a 14-byte increase in the row size) is moderate, and well worth it if it reduces blocking without introducing dirty reads.

    David

    *Row versions are also generated for INSERT on tables that have a trigger, as the INSERTED and DELETED virtual tables are read from the version store, but this is the case whether you have RCSI enabled or not.


    David http://blogs.msdn.com/b/dbrowne/


    Monday, January 5, 2015 7:12 PM

All replies

  • TempDb is used for the row version store. A row version is stored after an UPDATE or DELETE* of a row, and the row versions are retained while any session needs to read that version.  After which row versions are cleaned up in a background process.

    The size of the database does not affect the size of the version store, only the number of rows you UPDATE or DELETE.

    The only way to tell if it has a noticeable impact on your performance is to test.  Typically the overhead of row versioning (which also includes a 14-byte increase in the row size) is moderate, and well worth it if it reduces blocking without introducing dirty reads.

    David

    *Row versions are also generated for INSERT on tables that have a trigger, as the INSERTED and DELETED virtual tables are read from the version store, but this is the case whether you have RCSI enabled or not.


    David http://blogs.msdn.com/b/dbrowne/


    Monday, January 5, 2015 7:12 PM
  • Thanks for the explanation. Was not sure how it worked.

    Because of the potential dirty reads, if I want to enable RCSI I'll have to rewrite all the queries. Darn.

    Well I guess, for the time being, NOLOCK it is and will be limited to reporting (where we don't give a damned about dirty reads).

    Regards,

    Monday, January 5, 2015 7:33 PM
  • >where we don't give a damned about dirty reads

    Even knowing that a NOLOCK query run during an update can return results that are simply incorrect, and never existed in your database at any point in time?

    You can also enable RCSI and remove the NOLOCK from the queries over time.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Monday, January 5, 2015 7:35 PM
  • In our case, yes, we really don't care about the current second, even 10 seconds of work the application is doing. What interest us in the reporting is the one minute + old data in a sense. When you want to do reporting on 30 millions of rows, we don't give a crap about the youngest which represents .001%

    My current objectives anyway are to move:

    * the reporting application to a SQL server replicating the data just for that purpose.

    * inactive data to archive tables which won't see updates only inserts and deletes and reduce the number of rows in the active table.

    I can't really activate RCSI due to the dirty read risks. I'd have to rewrite most of the queries before being able to activate it. The application run like 400 concurrent threads that access the same table.

    Monday, January 5, 2015 8:06 PM
  • You don't need to rewrite all your queries before activating RCSI. Try to identify the few critical queries that cannot have the risk of other transactions updating the rows and use READCOMMITTEDLOCK hint for only those queries, which will preclude the risk.

    Satish Kartan www.sqlfood.com

    Monday, January 5, 2015 9:02 PM
  • >I can't really activate RCSI due to the dirty read risks

    What do you mean here?  RCSI doesn't allow dirty reads, unless you use NOLOCK or READ UNCOMMITTED.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, January 5, 2015 9:06 PM
  • >I can't really activate RCSI due to the dirty read risks

    What do you mean here?  RCSI doesn't allow dirty reads, unless you use NOLOCK or READ UNCOMMITTED.

    David


    David http://blogs.msdn.com/b/dbrowne/

    When RCSI is enabled, the data read by a query can be updated by other transactions as the read query will not put any shared locks on the rows. I guess OP doesn't want this, which is why I suggested the following:

    You don't need to rewrite all your queries before activating RCSI. Try to identify the few critical queries that cannot have the risk of other transactions updating the rows and use READCOMMITTEDLOCK hint for only those queries, which will preclude the risk.


    Satish Kartan www.sqlfood.com

    Monday, January 5, 2015 9:14 PM
  • As Satish said.

    True. And I have to make a pass of all the queries anyway since we are going from 2005 to 2012. Might as well do it at that time.

    Thanks for all the great info.

    Monday, January 5, 2015 9:54 PM
  • Be sure to consider Data Compression when moving large tables from 2005.  It provides space-optimized on-disk and in-memory storage for your table data.  The row/page format in 2005 and before is optimized to be simple and efficient to read/write, which is very often not the right choice for large tables and databases.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, January 5, 2015 10:08 PM
  • In our case, yes, we really don't care about the current second, even 10 seconds of work the application is doing. What interest us in the reporting is the one minute + old data in a sense. When you want to do reporting on 30 millions of rows, we don't give a crap about the youngest which represents .001%

    Beware that READ UNCOMMITTED can also lead to that you fail to read rows that have been comitted for ages, because they were move around while the NOLOCK query was running.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Proposed as answer by Russ Loski Tuesday, January 6, 2015 1:00 AM
    Monday, January 5, 2015 10:54 PM
  • darn, did not know that. thanks for the info.
    Tuesday, January 6, 2015 12:03 AM