none
System.IO.Compressio.DeflateStream clarifications please ...

    Question

  • Since .NET Framework 2.0 is now final it's probably too late, but:

    1. Why does the DeflateStream not offer the choice of compression levels (0..9?)
    2. Does anybody know what the compression level used actually is? (*)
    3. The 4GB limit seems to be bogus information - tests show that one can go beyond this border without problems. Maybe somebody mixed up the maximum file size in ZIP archives?!?

    (*) the speed seems to be too fast :) If random data gets compressed the ratio goes to 1:1.5 - which is pretty bad for many cases where one cannot afford to reset and perform another round with no compression - AFAIK compression level 9 in deflate keeps the overhead for noncompressable data acceptable low (1.5%?)

    Here's my playground (C# console application):

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Diagnostics;
    using System.IO;
    using System.IO.Compression;

    namespace DeflateTest
    {
        class NullStream : Stream
        {
            public long lConsumed = 0;

            public override bool CanRead  { get { return false; } }
            public override bool CanWrite { get { return true; } }
            public override bool CanSeek  { get { return false; }
            }
            public override long Length
            {
                get { throw new NotSupportedException(); }
            }
            public override long Position
            {
                get { throw new NotSupportedException(); }
                set { throw new NotSupportedException(); }
            }
            public override void Flush() { }
            public override int Read(byte[] buffer, int offset, int count)
            {
                throw new NotSupportedException();  // output only
            }
            public override void Write(byte[] buffer, int offset, int count)
            {
                lConsumed += count;
            }
            public override long Seek(long offset, SeekOrigin origin)
            {
                throw new NotSupportedException();
            }
            public override void SetLength(long value)
            {
                throw new NotSupportedException();
            }
        }

        class Program
        {
            static void Main(string[] args)
            {
                DeflateStream ds;
                NullStream ns;
                Random rnd;
                byte[] buf;
                long lTotal;

                Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.Idle;           

                ns = new NullStream();

                ds = new DeflateStream(ns, CompressionMode.Compress);

                rnd = new Random();

                buf = new byte[2 * 1024 * 1024];
                Array.Clear(buf, 0, buf.Length);

                lTotal = 0L;

                while (!Console.KeyAvailable)
                {
                    rnd.NextBytes(buf);
                    ds.Write(buf, 0, buf.Length);
                    lTotal += buf.Length;
                    Console.Write("{0:N0} -> {1:N0} ({2:N2}%)\r",
                        lTotal,
                        ns.lConsumed,
                        (double)ns.lConsumed / ((double)lTotal / 100));
                }

                ds.Close();

                Console.WriteLine();
            }
        }
    }







    Saturday, October 29, 2005 9:46 AM

Answers

  • The 4GB limit is probably because of the fact that Windows can't reference more than that amount of memory (2^32) at once (at least on most architectures).  Not really sure though.  It seems to me to be a reasonable limit.  Are you really trying to compress files larger than 4GB?  I deal with FAA traffic data and even a full days worth of traffic is not more 1GB in size and this is as XML.

    As for the compression level I looked at the source and it appears to be using level 3 provided you are doing a GZIP compression.  Otherwise the compression level is undefined.  However I don't know the GZIP header format and the code is actually just setting offset values in the header so I'm guessing strictly from the parameter name.  Use Reflector or something to examine System.IO.Compression.FastEncoder.Output.WriteGZipHeader.  This is the guy who generates the header contents.  It is called by GetCompressedOutput prior to writing the first time. 

    However looking at the constructors it seems like it'll never use GZIP because the only constructor that lets you specify that option is internal and the other constructors always set it to false.  If you want to actually use GZIP then you need to use GZipStream instead.  It is the only one that uses GZIP compression.

    DeflateStream is unfortunately not designed for extensibility so you'd end up having to write your own deflation stream to get the behavior you want.

    Michael Taylor - 10/29/05
    Saturday, October 29, 2005 3:27 PM

All replies

  • The 4GB limit is probably because of the fact that Windows can't reference more than that amount of memory (2^32) at once (at least on most architectures).  Not really sure though.  It seems to me to be a reasonable limit.  Are you really trying to compress files larger than 4GB?  I deal with FAA traffic data and even a full days worth of traffic is not more 1GB in size and this is as XML.

    As for the compression level I looked at the source and it appears to be using level 3 provided you are doing a GZIP compression.  Otherwise the compression level is undefined.  However I don't know the GZIP header format and the code is actually just setting offset values in the header so I'm guessing strictly from the parameter name.  Use Reflector or something to examine System.IO.Compression.FastEncoder.Output.WriteGZipHeader.  This is the guy who generates the header contents.  It is called by GetCompressedOutput prior to writing the first time. 

    However looking at the constructors it seems like it'll never use GZIP because the only constructor that lets you specify that option is internal and the other constructors always set it to false.  If you want to actually use GZIP then you need to use GZipStream instead.  It is the only one that uses GZIP compression.

    DeflateStream is unfortunately not designed for extensibility so you'd end up having to write your own deflation stream to get the behavior you want.

    Michael Taylor - 10/29/05
    Saturday, October 29, 2005 3:27 PM
  • Thanks for the answer! And yes, I'm trying to put more than 4GB to a DeflateStream (at least in theory), so that's why a limit there would have been a problem. My tests show that the stream doesn't freak out after writing 4GB+ to it, and neither does the Deflate spec say anything about a maximum size of data. As a real world example you can think about a VPN-like and permanent connection where data additionally gets compressed.

    Another thing not mapped from the Deflate algorithm is the ability to flush in stages: 1) flush so the dictionary gets kept, but the receiver gets at least all the data compressed so far and 2) flush everything and reset the dictionary. Especially 1) is very useful, since the compression doesn't get messed up too much. Again, I'm not sure what the DeflateStream.Flush() method calls internally - need to test that, but a comment in MSDN would actually be better.

    All my references here btw come from the zlib library, were compression levels and flush modes are exposed. Well, there's at least room for improvments for the .NET Framework 2.1 - and the deflate decoder be compatible, too.

    Sunday, October 30, 2005 3:46 AM
  •  

    Hi,
    I think “Compression and Decompression of Files using Visual Basic 2005” article on

    http://aspalliance.com/1287_Compression_and_Decompression_of_Files_using_Visual_Basic_2005.all

    may be helpful in this discussion.

    This popular white paper is written by a software engineer from our organization Mindfire Solutions (http://www.mindfiresolutions.com).


    I hope you find it useful!

    Cheers,
    Byapti




    Monday, October 15, 2007 9:40 AM
  • Although that documentation says the 4GB limitation is for both DeflateStream and GZipStream, only GZipStream is limited because of the CRC32 checksum. If you do not need CRC32 then use DeflateStream.

    DeflateStream does not have the 4GB limitation.

    • Proposed as answer by Jecho Jekov Tuesday, June 02, 2009 12:39 PM
    Tuesday, June 02, 2009 12:38 PM