none
How to judge the office(office2003 or office2007) file real types RRS feed

  • Question

  •     Hello,everyone.
        I want to judge the office files type(doc,ppt,xls,pptx,docx,xlsx),But I can not rely on the office file's suffix.Because the suffix maybe changed.for example,
    we can change the .pptx into .ppt...So I must find the real office file type. Do their The Binary File have any identifer?
    Any ideas?  Thanks for your answer.

    Sunday, January 6, 2013 1:59 AM

Answers

  • Hi, you will have to employ 2 different types of processing methods to discover the actual file type.

     

    OOXML:

    For files that are .docx, .xlsx, and .pptx you treat the file as a zip package. When examining the contents of the package, one of the first things you'll notice is a folder named "xl", "word", or "ppt". Also, you can examine the contents of the docProps\app.xml file. That file contains an <Application> element that will contain the value "Microsoft Excel", "Microsoft Office Word", or "Microsoft Office PowerPoint" respectively.

     

    For files that are .doc, .xls, and .ppt, the method is much more complex. Please take a look at the following article and let me know if you have any additional questions.

     

    Determining Office Binary File Format Types

    http://blogs.msdn.com/b/openspecification/archive/2013/01/16/determining-office-binary-file-format-types.aspx


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Monday, January 21, 2013 7:34 PM
    Moderator

All replies

  • Hi How to judege,
     
    Thank you for your question. A member of the Protocol Documentation support team will respond to you soon.
     
    Regards,
    Vilmos Foltenyi - MSFT
    Sunday, January 6, 2013 4:50 AM
  • Hi How to judege,
     
    Thank you for your question. A member of the Protocol Documentation support team will respond to you soon.
     
    Regards,
    Vilmos Foltenyi - MSFT
    Thank you, best wish for you.
    Sunday, January 6, 2013 6:28 AM
  • Hi, I am the engineer who will be working with you on this issue. I am currently researching the problem and will provide you with an update soon. Thank you for your patience.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Monday, January 7, 2013 2:39 PM
    Moderator
  • Hi, you will have to employ 2 different types of processing methods to discover the actual file type.

     

    OOXML:

    For files that are .docx, .xlsx, and .pptx you treat the file as a zip package. When examining the contents of the package, one of the first things you'll notice is a folder named "xl", "word", or "ppt". Also, you can examine the contents of the docProps\app.xml file. That file contains an <Application> element that will contain the value "Microsoft Excel", "Microsoft Office Word", or "Microsoft Office PowerPoint" respectively.

     

    For files that are .doc, .xls, and .ppt, the method is much more complex. Please take a look at the following article and let me know if you have any additional questions.

     

    Determining Office Binary File Format Types

    http://blogs.msdn.com/b/openspecification/archive/2013/01/16/determining-office-binary-file-format-types.aspx


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Monday, January 21, 2013 7:34 PM
    Moderator