none
Determine DOC or DOT From File Binary Info RRS feed

  • Question

  • Ultimately this will be done in VB/ASP.

    I need to be able to disseminate between a Word 2003 document file and a Word 2003 template file by looking at the file's binary content and NOT by the file extension.

    This page seems to have accurate information for determining Office 2003 files and, through a secondary signature, a Word file. It shows that an office file has a byte signature of D0 CF 11 E0 A1 B1 1A E1, and that at offset 512 (0x200) the byte sequence EC A5 C1 00 indicates a Word document. However, both DOC and DOT files contain this secondary key.

    http://www.garykessler.net/library/file_sigs.html

    I've noticed that a few bytes down stream at 522 (0x20A) all the template files I've checked have an F1. DOC files have an F0, unless they have a figure in them, then it's an F8.

    I'm GUESSING (hoping!) that this area of the file is the key as to whether it's a template or a document, but I've not found any place to get solid information about this.

    Is there anyplace I can go to get this info? Incidentally, I'll need to do the same thing with DOCX and DOTM files.

    Any help would be greatly appreciated.

    Many thanks,
    Ken

    • Moved by Chenchen Li Tuesday, September 20, 2016 2:07 AM Binary File Formats
    Monday, September 19, 2016 2:25 AM

Answers

  • Hi Ken, you are correct. The F1 value from the file is what indicates that it's a document template. However, it's actually only a single bit from that byte that is the flag the needs to be set. The structure that represents the data that you are looking at is the FibBase structure which is described in MS-DOC section 2.5.2. The A flag, or fDot, will be set if the document is a template.

     

    MS-DOC 2.5.2 FibBase

    A - fDot (1 bit): Specifies whether this is a document template.

    The byte in question, 0xF1, stores the A, B, C, D, and E properties. When viewing byte diagrams, it's important to remember that the bit order is reversed from the way we are normally used to looking at them. In this case, the byte 0xF1 in binary looks like 1111 0001, but when we compare that to the byte diagram we need to reverse the order, so it should be ordered as 1000 1111. So, the values of the properties are the following…

     

    A

    B

    C

    D

    E

    1

    0

    0

    0

    1111

     

    Taking it one step further, you mentioned that when the document has an image that the value is 0xF8, which means that the D bit is set, which is the fHasPic property.

     

    Please let me know if you have any other questions. Thank you.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Tuesday, September 20, 2016 9:22 PM
    Moderator

All replies

  • Hi Ken,

    AFAIK there is no binary data distinction between a document and a template with the Word 97-2003 file format. Although it's not good practice to do so, simply changing a .doc file's extension to .dot is enough for Word to treat it as a template. I suspect a good many 'templates' have been created that way.

    With the Word 2007 & later xml format, though, the story is different - changing a .docx or .docm file's extension to .dotx or .dotm won't work - Word will baulk at opening such files; evidently there is a meaningful binary difference between such files - though I've not seen it documented - which may also require decompressing the file to access.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Monday, September 19, 2016 6:14 AM
  •  Hi,

    Thanks for posting here.

    This forum(Word for Developers) is for Developer discussions and questions involving Microsoft Word, like developing issues related with Word Object model

    Since your issue is about binary file format, I would move this thread into the following forum

    https://social.msdn.microsoft.com/Forums/office/en-US/home?forum=os_binaryfile

    Sorry for any inconvenience and have a nice day! 

    Regards,

    Celeste

    Tuesday, September 20, 2016 2:06 AM
  • Hi Ken,

    Thank you for contacting Microsoft Open Protocols Support. A member from the open protocols support team will respond here to the post.

    Thanks,

    Nathan Manis

    Open Protocols Support

    Tuesday, September 20, 2016 3:59 AM
    Moderator
  • Tuesday, September 20, 2016 5:26 AM
  • Hi Ken, I am the engineer who will be working with you on this issue. I am currently researching the problem and will provide you with an update soon. Thank you for your patience.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Tuesday, September 20, 2016 4:29 PM
    Moderator
  • Hi Ken, you are correct. The F1 value from the file is what indicates that it's a document template. However, it's actually only a single bit from that byte that is the flag the needs to be set. The structure that represents the data that you are looking at is the FibBase structure which is described in MS-DOC section 2.5.2. The A flag, or fDot, will be set if the document is a template.

     

    MS-DOC 2.5.2 FibBase

    A - fDot (1 bit): Specifies whether this is a document template.

    The byte in question, 0xF1, stores the A, B, C, D, and E properties. When viewing byte diagrams, it's important to remember that the bit order is reversed from the way we are normally used to looking at them. In this case, the byte 0xF1 in binary looks like 1111 0001, but when we compare that to the byte diagram we need to reverse the order, so it should be ordered as 1000 1111. So, the values of the properties are the following…

     

    A

    B

    C

    D

    E

    1

    0

    0

    0

    1111

     

    Taking it one step further, you mentioned that when the document has an image that the value is 0xF8, which means that the D bit is set, which is the fHasPic property.

     

    Please let me know if you have any other questions. Thank you.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Tuesday, September 20, 2016 9:22 PM
    Moderator