none
How do I know I what type of Office 97-2003 document I have? RRS feed

  • Question

  • We are often sent files by 3rd parties who sometimes unfortunately think that converting a document simply means to change it's file extension.  Luckily at this point we only need to detect Microsoft Office Documents

    DETERMINING THE DOCUMENT TYPE:
    I read the first 8 bytes to determine whether or not we have an Office 97-2003 document or OOXML document.

              OOXML  = 50 4B 03 04 14 00 06 00

    office 97-2003 = D0 CF 11 E0 A1 B1 1A E1


    DETERMINING THE ASSOCIATED APP: (THIS IS WHAT I WANT)
    OOXML: This is a breeze since it's all xml anyway.  I just use my favorite zip utility and xml reader to read the contents of the package.  I don't even need .NET to be present.

    OFFICE 97-2003:  Here is where it gets a little more difficult.  I hear that there are subheaders beginning at byte offset 512 (http://www.garykessler.net/library/file_sigs.html).  However,the results returned aren't always consistent with those provided by this site.

    Apart from using the structured storage APIs ... Is there a reliable way of determining what type of Office 97-2003 document you have by reading bytes?


    NOTE:  The files will reside on a server without MS Office loaded.

    Wednesday, October 17, 2012 3:27 PM

Answers

  • It was an interesting thread, but ultimately didn't do the trick.  

    I will have to just use the structured storage APIs for now.  We were trying to make the app as lightweight and reliable as possible.

    Thanks for looking though.  It's much appreciated.  No one wanted to touch this one with a ten foot pole it seems :)

    Kind Regards

    • Marked as answer by TSRACT Friday, November 2, 2012 2:49 AM
    Friday, November 2, 2012 2:48 AM

All replies