none
Microsoft xpress compression algorithm (LZ77 + direct2) RRS feed

  • Question

  • Hello

     

    I'm currently working on my University final year project dealing with Windows XP and Windows 7 hiberfil.sys files.

     

    I've been studying this document:

     

    http://msdn.microsoft.com/en-us/library/ee441458(v=PROT.13).aspx

     

    Which relates to the compression algorithm used on the hiberfil.sys (also used in Exchange and WIM files).

     

    I've gotten my head around how the compression and meta-data encoding works but I'm a little confused on some of the wording of the document.

     

    From what I've ascertained, compressed blocks can be identified within the file by searching for the header \x81\x81"xpress" and directly after this header are four bytes than contain a bitmask which provides information for the decoder as to what bytes are data and which are meta-data (eg actual bytes that can be outputted and meta data produced by the compression algorithm that requires further processing to decompress the original data)

     

    My problem comes in understand how the compressed blocks are actually structured.  In the official Microsoft documentation, it states:

    To distinguish data from metadata in the compressed byte stream, the data stream begins with a 4-byte bitmask that indicates to the decoder whether the next byte to be processed is data (a "0" value in the bit), or if the next byte (or series of bytes) is metadata (a "1" value in the bit).

     

    This I understand and reading it in binary makes sense to me.  What doesn't make sense at present is where the actual data begins, as the document also states:

    When the bitmask has been consumed, the next four bytes in the input stream are another bitmask.

     

    So the first four bytes are a bitmask which indicates how the first 32 bytes of data in the compressed block are to be interpreted by the decoder, but I am not entirely sure where the 32 bytes of data actually start?

     

    Do they start straight away after the four bytes, or are the next four bytes another bitmask followed by another until the actual data starts?

     

    If anyone could provide some information on how the file is structured I would appreciate it.

     

    Many thanks

     

    Tony

    Monday, December 27, 2010 8:00 PM

Answers

  • Update to forum.  Tony and I worked off-line.  This forum is for software developers who are using the Open Protocol Specification documentation to assist them in developing systems, services, and applications that are interoperable with Windows.

     

    The Open Protocol Specifications can be found at: http://msdn2.microsoft.com/en-us/library/cc203350.aspx.

     

    This issue involved the algorithm used to compress the hibernation file hiberfil.sys, which does not involve on-the-wire protocols nor Windows interoperability.  Accordingly, it is beyond the scope of this forum.


    Bryan S. Burgin Senior Escalation Engineer Microsoft Protocol Open Specifications Team
    Tuesday, February 15, 2011 9:38 PM
    Moderator

All replies

  • Hi,

    Thanks for your question. One of my colleagues will be in touch with you shortly.

    Regards,

    Edgar


    Edgar
    Monday, December 27, 2010 9:17 PM
    Moderator
  • Hi, Tony,

     

    I can help you with this.  Let me do some preliminary research.  Can you send me an e-mail at "dochelp (at) microsoft (dot) com" so we can collaborate off-line.


    Bryan S. Burgin Senior Escalation Engineer Microsoft Protocol Open Specifications Team
    Tuesday, December 28, 2010 4:46 PM
    Moderator
  • Update to forum.  Tony and I worked off-line.  This forum is for software developers who are using the Open Protocol Specification documentation to assist them in developing systems, services, and applications that are interoperable with Windows.

     

    The Open Protocol Specifications can be found at: http://msdn2.microsoft.com/en-us/library/cc203350.aspx.

     

    This issue involved the algorithm used to compress the hibernation file hiberfil.sys, which does not involve on-the-wire protocols nor Windows interoperability.  Accordingly, it is beyond the scope of this forum.


    Bryan S. Burgin Senior Escalation Engineer Microsoft Protocol Open Specifications Team
    Tuesday, February 15, 2011 9:38 PM
    Moderator
  • Hi,

         Microsoft uses a compression algorithm called "Microsoft Xpress Compression" for storing data in places, Exchange 2010 for example. I am trying to find if there is any library that Microsoft provides for using this compression\decompression algorithm. I searched on the net but could not find any . All I could find that the algorithm is a combination of two algorthms - LZ77+ direct2. I also found how these algorithm work.

    here are some of the links :-

    http://msdn.microsoft.com/en-us/library/ee441458%28v=PROT.13%29.aspx

    http://msdn.microsoft.com/en-us/library/ee441602%28v=PROT.13%29.aspx

    Can you please help me with this?
    Friday, August 12, 2011 11:36 AM
  • This forum is for software developers who are using the Open Specifications documentation to assist them in developing systems, services, and applications that are interoperable with certain Microsoft products.  The Open Specifications can be found at: http://msdn2.microsoft.com/en-us/library/cc203350.aspx.

     

    While you are citing an Open Specification document, [MS-DRSR], are you actually implementing that protocol specification?  Or are you working on something non-Protocol related and believe that the Compression Algorithm for your project is the same as that discussed in [MS-DRSR]?  You also cite Exchange 2010.  Are you working on an Exchange 2010 issue that is discussed on the Exchange Server Protocols page at http://msdn.microsoft.com/en-us/library/cc307725(v=EXCHG.80).aspx?


    Bryan S. Burgin Senior Escalation Engineer Microsoft Protocol Open Specifications Team
    Friday, August 12, 2011 8:32 PM
    Moderator