locked
Storing data in SQL Server as Compressed Strings. RRS feed

  • Question

  • I would like to know the pros and cons of compressing strings (data) to store in SQL Server.

     

    If any one could give me some idea.

     

     

    thanks.

    Tuesday, November 11, 2008 6:16 PM

Answers

  • This kind of goes along with the data compression for SQL Server 2008.  The application varies from server to server, and scenario to scenario.  For some servers, disk space is cheaper than CPU and more readily available, so I would opt to not compress the strings in that case, but this is probably a rare case in todays hardware.  To use the data in SQL, you would have to create a CLR function to uncompress the data which would utilize cpu.  I would examine the need to use/consume the data in TSQL or not when determining if I would compress it for storage or not.  What are your compression ratios, and is the trade off really worth it?

     

    For 2008, the builtin Data Compression is a great feature if you have the available CPU utilization needed to use it, which generally is the case on a SQL Server.  The reduction it makes in # of pages to be read from a large database, is generally well worth any cost there is to CPU utilization.  At the current point in time Disk IO is usually the limiting factor to how fast a SQL Server can process information.  Someone once showed me a triangle of CPU - Memory - Disk, at any point in time 2 will be green and one will be red.  Disk IO rates have not been able to keep up with the improvements to CPU and memory over the last decade, and therefore compressing data reduces the amount of reading from disk that needs to be done to satisfy a query which results in faster execution times, at the expense of CPU for decompressing the data.

     

     

    Wednesday, November 12, 2008 3:30 AM
  • If you r WCF interface is doing that do some reason (like saving storage capacity) I would calculate the benfit of having reduced storage sizes against the CPU util and the maintainance of a compression functionality. Be aware that queries might not be able to access the data in predicates as the data is compressed. If ony storage is the reason for you doing that, I would take a look on the SQL Server 2008 options rather than doing all the compression and decompression stuff at the client (or at the server). Having that implemented natively on the server (like in SQL Server 2008) would not give you the flexibility of the compression algorythm but would make it more transparent to subsequent business operations (like searching)

     

    Jens K. Suessmeyer

    Wednesday, November 12, 2008 8:51 AM

All replies

  • What is the purpose you are compressing the strings ? Do you want to save space or is the data just delivered as such ? Which SQL Server version are you using ?

     

    Jens K Suessmeyer

    Tuesday, November 11, 2008 8:05 PM
  • The pros and cons are the same as encryption. As long as there is no manual intervention to alter the string outside of the decryption procedure, you're fine. In other words, if you try to decompress the string and someone has changed it, you are out of luck trying to figure out what the original value was.

     

    A.D.T.

     

    Tuesday, November 11, 2008 8:16 PM
  • We will be having a WCF interface which will send the Compressed data to the database. Primary objective is to save space.
    We are using SQL Server 2005.
    Tuesday, November 11, 2008 8:57 PM
  • This kind of goes along with the data compression for SQL Server 2008.  The application varies from server to server, and scenario to scenario.  For some servers, disk space is cheaper than CPU and more readily available, so I would opt to not compress the strings in that case, but this is probably a rare case in todays hardware.  To use the data in SQL, you would have to create a CLR function to uncompress the data which would utilize cpu.  I would examine the need to use/consume the data in TSQL or not when determining if I would compress it for storage or not.  What are your compression ratios, and is the trade off really worth it?

     

    For 2008, the builtin Data Compression is a great feature if you have the available CPU utilization needed to use it, which generally is the case on a SQL Server.  The reduction it makes in # of pages to be read from a large database, is generally well worth any cost there is to CPU utilization.  At the current point in time Disk IO is usually the limiting factor to how fast a SQL Server can process information.  Someone once showed me a triangle of CPU - Memory - Disk, at any point in time 2 will be green and one will be red.  Disk IO rates have not been able to keep up with the improvements to CPU and memory over the last decade, and therefore compressing data reduces the amount of reading from disk that needs to be done to satisfy a query which results in faster execution times, at the expense of CPU for decompressing the data.

     

     

    Wednesday, November 12, 2008 3:30 AM
  • If you r WCF interface is doing that do some reason (like saving storage capacity) I would calculate the benfit of having reduced storage sizes against the CPU util and the maintainance of a compression functionality. Be aware that queries might not be able to access the data in predicates as the data is compressed. If ony storage is the reason for you doing that, I would take a look on the SQL Server 2008 options rather than doing all the compression and decompression stuff at the client (or at the server). Having that implemented natively on the server (like in SQL Server 2008) would not give you the flexibility of the compression algorythm but would make it more transparent to subsequent business operations (like searching)

     

    Jens K. Suessmeyer

    Wednesday, November 12, 2008 8:51 AM