none
Could you point me to the official document that describes the file header of Microsoft Office XML document? RRS feed

  • Question

  • When saving a file in docx, xlsx, pptx format in Microsoft Office, the file header will be 0x 50 4B 03 04 14 00 06 00 08 00.

    Is there any official document that describes Microsoft Office uses this file header?

    According to .Zip file format specification and ECMA-376, I read it like:

    -Local file header signature  4 bytes : 0x02014b50
    -Version needed to extract   2 bytes : 0x0014 = Ver.2.0
    -General purpose bit flag      2 bytes : 0x0006 = Super Fast compression option was used.
    -Compression method           2 bytes : 0x0008 = The file is Deflated.

    Is this correct?

    ref.

    http://www.ecma-international.org/activities/Office%20Open%20XML%20Formats/Draft%20ECMA-376%203rd%20edition,%20March%202011/Office%20Open%20XML%20Part%202%20-%20Open%20Packaging%20Conventions.pdf


    • Moved by Chenchen Li Friday, August 4, 2017 2:37 AM Office XML file format
    Thursday, August 3, 2017 7:09 AM

Answers

  • Hi Ken,

    >>This means that magic number of Office Open XML Format document is not necessary 50 4B 03 04 14 00 06 00 08 00. It can be some thing like 50 4B 03 04 0A 00 00 00 00 00. And the number 50 4B 03 04 14 00 06 00 08 00 is just an instance of ISO/IEC 29500 standards currently used by Microsoft Office.

    Yes, that's correct.

    Tom

    Tuesday, August 8, 2017 6:01 PM
    Moderator

All replies

  • Ok, I found in [MS-OI29500] where it refers to the "Compression method".

    ------------------------------------------------

    2.1.1742
    Part 2 Section 11, Core Properties
    a.   
    The standard states that package implementer shall not use any compression algorithm other than
    DEFLATE.
    Office uses the STORE algorithm, in addition to DEFLATE.

    ------------------------------------------------

    Still looking for the spec regarding the "Version needed to extract", and "General purpose bit flag".

    Thursday, August 3, 2017 10:53 AM
  • Hello Ken,

    This forum is for development issues when using Open XML SDK to manipulate Office documents. Please visit Welcome to the Open XML SDK 2.5 for Office According to your description, your issues about Office document format is out of scope. Office XML, ODF, and Binary File Formats forum is for discussing issues related to Office XML file format, so I would move this thread into that forum.

    Regards,

    Celeste


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, August 4, 2017 2:36 AM
  • Hi Ken,

    The purpose of this forum is to support Microsoft's Open Specifications protocol documents.  Your issue touched on a topic that is covered under that umbrella.  Accordingly, an engineer from the protocols team will contact you soon.

     


    Bryan S. Burgin Senior Escalation Engineer Microsoft Protocol Open Specifications Team

    Friday, August 4, 2017 3:47 PM
    Moderator
  • Hi Ken, 

    I see that you've found our implementation notes on ISO/IEC 29500 (ECMA 376 is equivalent) in [MS-OI29500]. Specifically, you found the notes on ISO 29500-2 OpenXML Packaging Conventions. The standard, part 2, 10.2 "Mapping to a ZIP Archive" and "Annex C. (normative) ZIP Appnote.txt Clarifications" have some detail and clarifications on the header fields. It looks to me that you have correctly interpreted the header fields so far. Did you have additional question about how these are used by Office applications or in general for implementation? 

    Best regards,
    Tom Jebo 
    Sr Escalation Engineer
    Microsoft Open Specifications Support



    Friday, August 4, 2017 6:55 PM
    Moderator
  • Hi Tom,

    Thank you for the prompt response.

    I think I found the last piece in ISO/IEC 29500-2.

    from Table C–3. Support for Version Needed to Extract field
     -> Byte 5-6 can be 0xA(Version 1.0) or 0x14(Version 2.0) or 0x2D(Version 4.5)

    from Table C–5. Support for modes/structures defined by general purpose bit flags
     -> Byte 7-8 can be:
    0000000000001110=0x000E
    (The fields crc-32, compressed size and uncompressed size are set to zero in the local header. + Super Fast compression option was used.)

    *snip*

    0000000000000110=0x0006
    (Super Fast compression option was used.)
    0000000000000100=0x0004
    (Fast compression option was used.)
    0000000000000010=0x0002
    (Maximum compression option was used.)
    0000000000000000=0x0000
    (Normal compression option was used.)

    from Table C–4. Support for Compression Method field
     -> Bytes 9-10 can be 0(no compression) or 8(The file is Deflated).

    This means that magic number of Office Open XML Format document is not necessary 50 4B 03 04 14 00 06 00 08 00. It can be some thing like 50 4B 03 04 0A 00 00 00 00 00. And the number 50 4B 03 04 14 00 06 00 08 00 is just an instance of ISO/IEC 29500 standards currently used by Microsoft Office.

    Am I correct?

    Regards,


    • Edited by KenTsuchiya Tuesday, August 15, 2017 10:56 AM Typo
    Tuesday, August 8, 2017 11:47 AM
  • Hi Ken,

    >>This means that magic number of Office Open XML Format document is not necessary 50 4B 03 04 14 00 06 00 08 00. It can be some thing like 50 4B 03 04 0A 00 00 00 00 00. And the number 50 4B 03 04 14 00 06 00 08 00 is just an instance of ISO/IEC 29500 standards currently used by Microsoft Office.

    Yes, that's correct.

    Tom

    Tuesday, August 8, 2017 6:01 PM
    Moderator