none
Solid State Drives for tempdb/log files RRS feed

  • Question

  • I have just read in SQL Server Internals book that SSDs are not a great fit for tempdb and log files due to the large anounts of writes that these files do and that SSD have a limited lifespan of writes (maybe in the 10s of thousands)

    This book was written 4 years ago, is this considered to be true? We should still avoid SSD for tempdb/log files?

    Thanks

    Monday, November 12, 2012 2:08 PM

Answers

  • I have just read in SQL Server Internals book that SSDs are not a great fit for tempdb and log files due to the large anounts of writes that these files do and that SSD have a limited lifespan of writes (maybe in the 10s of thousands)

    This book was written 4 years ago, is this considered to be true? We should still avoid SSD for tempdb/log files?

    Thanks

    It Depends.  Where is your bottleneck and what is the best ROI for the cost of the SSDs?  If you are using tempdb heavily have you looked at ways to reduce the tempdb usage or tracking what is using tempdb for your application?  My experience has been that tempdb usage can be reduced for most systems with a little time spent understanding why it is being used.  Today's SSDs all leverage wear leveling technology to increase the drive lifespan, and many are overprovisioned with NAND cells to also improve drive life.  If your SSD is MLC based it will wear out faster than SLC based NAND which has 10 times the write endurance per cell due to the lower voltages required to write a single bit instead of a double bit, but SLC has lower capacity for the same number of cells and is also more expensive.

    SSDs seem like the simple answer here, but my experience has been that you get the best ROI for SSDs by placing your random read heavy workloads on them than tempdb or log files.  Paul did a lot of testing on this a few years ago that supported this as well.

    http://www.sqlskills.com/blogs/paul/post/Benchmarking-Introducing-SSDs-(Part-1-not-overloaded-log-file-array).aspx


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Monday, November 12, 2012 2:16 PM
    Moderator

All replies

  • pls see the link bellow

    http://msdn.microsoft.com/en-us/library/hh240550.aspx 

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, November 12, 2012 2:12 PM
  • I have just read in SQL Server Internals book that SSDs are not a great fit for tempdb and log files due to the large anounts of writes that these files do and that SSD have a limited lifespan of writes (maybe in the 10s of thousands)

    This book was written 4 years ago, is this considered to be true? We should still avoid SSD for tempdb/log files?

    Thanks

    It Depends.  Where is your bottleneck and what is the best ROI for the cost of the SSDs?  If you are using tempdb heavily have you looked at ways to reduce the tempdb usage or tracking what is using tempdb for your application?  My experience has been that tempdb usage can be reduced for most systems with a little time spent understanding why it is being used.  Today's SSDs all leverage wear leveling technology to increase the drive lifespan, and many are overprovisioned with NAND cells to also improve drive life.  If your SSD is MLC based it will wear out faster than SLC based NAND which has 10 times the write endurance per cell due to the lower voltages required to write a single bit instead of a double bit, but SLC has lower capacity for the same number of cells and is also more expensive.

    SSDs seem like the simple answer here, but my experience has been that you get the best ROI for SSDs by placing your random read heavy workloads on them than tempdb or log files.  Paul did a lot of testing on this a few years ago that supported this as well.

    http://www.sqlskills.com/blogs/paul/post/Benchmarking-Introducing-SSDs-(Part-1-not-overloaded-log-file-array).aspx


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Monday, November 12, 2012 2:16 PM
    Moderator
  • This book was written 4 years ago, is this considered to be true? We should still avoid SSD for tempdb/log files?

    Well, the technology has improved since then, the lifetime of the newer generation of chips is about 10x better.

    So here's the thing, they *should* give you a somewhat graceful degradation when they get towards the end of life, you will want to be a little more diligent about looking for failure messages as they occur, the failure modes are different from rotating storage.  

    So, what's it worth to you?  Even for tempdb you can probably get a couple of years of use out of an SSD before it starts to fail.  SSD is funny stuff, because of the load-leveling across chips, buying twice as much makes it last twice as long!

    In general log files will do less writing per day, and main db storage least of all.  But of course, run the numbers for your own operation.  Maybe the performance is worth it to you, even if you have to replace the drives every year! 

    In summary, the technology has improved enough to make it practical, at a price, which may be more than just the obvious delta in purchase price, and may include a little more careful operational management as well.

    Josh

    Monday, November 12, 2012 4:15 PM
  • Well, the technology has improved since then, the lifetime of the newer generation of chips is about 10x better.

    So here's the thing, they *should* give you a somewhat graceful degradation when they get towards the end of life, you will want to be a little more diligent about looking for failure messages as they occur, the failure modes are different from rotating storage.  

    So, what's it worth to you?  Even for tempdb you can probably get a couple of years of use out of an SSD before it starts to fail.  SSD is funny stuff, because of the load-leveling across chips, buying twice as much makes it last twice as long!

    The write life of an individual cell has not increased by 10X over the last 4 years.  I'd love it if it had but that just isn't true.  The improvements in wear-leveling control by the controllers has made drive lifespans better.  Even with all of the improvements, we recently replaced/RMA'd two PCI-X SSDs from our test lab that we killed in just a few months of benchmark testing using workloads that weren't very different from common SQL Server workloads we see all the time consulting, and I am not talking about edge cases either.  On the wear leveling side of things, the algorithms used differ so it is something that I'd look at closely and test how the drive operates.  Some newer SSD controllers use static data rotation, a process where static data is moved from a cell that hasn't been written to frequently to a cell that has been written to frequently to allow the distribution of writes to wear all of the cells evenly.  Some drives don't perform this type of operation so if you have static data, as the drive approaches capacity the non-static cells wear out faster, the problem with the two PCI-X SSDs we RMA'd.  Additionally for some drives that leverage static data rotation, there is a measurable performance impact when it is occurring.  As long as you have the appropriate RAID configuration for redundancy SSDs are as good lifespan wise for most applications as traditional rotating media.  I've had the same SSD in one of my laptops for over 4 years now without problems and it was an early MLC SSD that doesn't do static data rotation or some of the more advanced wear-leveling in newer SSDs, yet it keeps on working and I run multiple VMs for presenting on it all the time.

    One thing I would point out to the original poster is that if you are testing SSDs, try to do your testing within a vendors lab environment, not against the SSDs that you plan to use for a production server.  Your testing reduces the life span as does the reconfiguration/formatting of the SSD after testing completes to prep it for production use.  I've seen that bite people as well.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!


    Monday, November 12, 2012 4:26 PM
    Moderator
  • The write life of an individual cell has not increased by 10X over the last 4 years.  I'd love it if it had but that just isn't true.  The improvements in wear-leveling control by the controllers has made drive lifespans better.  Even with all of the improvements, we recently replaced/RMA'd two PCI-X SSDs from our test lab that we killed in just a few months of benchmark testing using workloads that weren't very different from common SQL Server workloads we see all the time consulting, and I am not talking about edge cases either.

    OP said he read something printed in 2008.  It was just about then that we went to a new generation of flash technologies extending the write life from about 10k to about 100k, which remains a typical number today AFAIK.  It has not been a continuous improvement process, you're right about that.

    The leveling algorithms are also needed, in order to make a device practical.

    http://en.wikipedia.org/wiki/Flash_memory#Memory_wear

    I am interested in hearing more about the drives that you wore out so quickly - these were current technology?  Do you think they failed to live up to their expected lifetime?  You said you RMA'd them, did you return/replace them under warranty?  Would you like to share the vendor name?  What was their failure mode, did they gracefully start to return too many errors and shut down capacity, or did they fail more suddenly?

    Josh


    • Edited by JRStern Tuesday, November 13, 2012 12:43 AM
    Tuesday, November 13, 2012 12:42 AM
  • I am interested in hearing more about the drives that you wore out so quickly - these were current technology?  Do you think they failed to live up to their expected lifetime?  You said you RMA'd them, did you return/replace them under warranty?  Would you like to share the vendor name?  What was their failure mode, did they gracefully start to return too many errors and shut down capacity, or did they fail more suddenly?

    Josh


    Yes they were current technology, and we've killed drives from almost all of the top names in the enterprise SSD market at one point or another.  Some cases were loaner drives for evaluation purposes, others were drive owned and replaced under warranty.  Some failed catastrophically others gracefully.  Details aren't something I can provide for some cases so it's not fair to any one vendor to call out their failures if I can talk about other vendors.  Knowing the books that are out there for SQL on this topc, I would venture a guess that the book being referenced was the Wrox 2008 Internals and Troubleshooting book, which i I happen to be one of the co-authors for, and the chapter in question was written by Brent Ozar.  If that is the case, the information is just as applicable now as it was then, the tech hasn't changed for SLC based NAND which is where you get the 100000 write lifecycle. MLC chips totally depend on the size cell for the write lifecycle.  In 2011 a number of vendors started changing from 34nm cells which had a 5000 write lifecycle to 25nm cells which have a 3000 write lifecycle because it was cheaper.  Your enterprise PCI-X cards typically are SLC for the longer endurance, or if they use MLC they are heavily over provisioned to give them longer life spans.  I think you should go look at the actual changes that have occurred in the last 3-4 years because the information you are presenting is not accurate here.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    • Proposed as answer by Jon Gurgul Tuesday, November 13, 2012 9:32 AM
    • Unproposed as answer by Jon Gurgul Tuesday, November 13, 2012 9:34 AM
    Tuesday, November 13, 2012 1:18 AM
    Moderator
  • Yes they were current technology, and we've killed drives from almost all of the top names in the enterprise SSD market at one point or another.  Some cases were loaner drives for evaluation purposes, others were drive owned and replaced under warranty.  Some failed catastrophically others gracefully.  Details aren't something I can provide for some cases so it's not fair to any one vendor to call out their failures if I can talk about other vendors.

    I don't know that I understand the lesson here.  Are these SSDs ready for prime time, or not?

    Would you only trust them for the much lower write-cycle use of main database storage?

    --

    For the history of the flash chips, I'll see what else I can Google up on the old 10k life, if anyone cares.  You could hardly build an enterprise-class drive out of anything much less than 100k, is the point, without overprovisioning by such a huge factor the price would become impossible.

    Josh

    Tuesday, November 13, 2012 5:55 AM
  • For the history of the flash chips, I'll see what else I can Google up on the old 10k life, if anyone cares.  You could hardly build an enterprise-class drive out of anything much less than 100k, is the point, without overprovisioning by such a huge factor the price would become impossible.

    Josh

    You should hook up with some vendors and do some independent testing here.  True enterprise SSDs leverage SLC for the most part, and the MLC based ones are already overprovisioned and provide the best price point per GB with amazing performance.  I wouldn't hesitate to put production workloads on real "enterprise" SSDs regardless of SLC or MLC with appropriate redundancy considerations and replacement drives on-hand.  I've recommended plenty of solutions to clients that have been fabulous over the past two years.  

    The lesson here is that tempdb and log files are not always the best place to leverage SSDs.  It all depends on where you specific bottleneck happens to be.  Far to many people espouse these two things as the best place to use SSDs and that is just not the case. This is no different than past scenarios where people used to recommend that tempdb be placed on a RAMDISK instead of on physical storage.  Without understanding the true bottleneck you can easily waste expensive hardware resources with the wrong implementation simply because some article, book, or other reference said it was the best configuration.  The worst part of my job as a consultant is engaging where a client has laid out tons of money on hardware that didn't fix the problem because they didn't understand the true bottleneck they were dealing with, or the newer hardware eliminated one bottleneck in lieu of a separate more difficult one to eliminate and they still had major limitations.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs 
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider
    Please click the Mark as Answer button if a post solves your problem!

    Tuesday, November 13, 2012 6:05 AM
    Moderator
  • Knowing the books that are out there for SQL on this topic, I would venture a guess that the book being referenced was the Wrox 2008 Internals and Troubleshooting book

     correct
    Wednesday, November 14, 2012 7:27 AM
  • Knowing the books that are out there for SQL on this topic, I would venture a guess that the book being referenced was the Wrox 2008 Internals and Troubleshooting book

     correct
    Being a coauthor on that book I can't deny a certain level of bias here, but I had nothing to do with Brent's chapter and I'd recommend going to his site and watching his more recent videos on storage for SQL.  You will find that Brent's recommendations have not really changed, and his crew are on the leading edge of SSD testing just like we are at SQLskills, so there will be parity between our specific recommendations here.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!



    Wednesday, November 14, 2012 7:31 AM
    Moderator