Ask a questionAsk a question
 

AnswerExport macro from office file

  • Monday, June 29, 2009 10:08 PMLealdo Delucci Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi, I am trying to write a program that will recognize whether or not an office document (.doc, .xls, etc) has a macro in it, and if it does, to export that to a separate document. I need to do this in a platform independent way which means I cannot use VB. (I am developing in Python).

    What I have found is a way to find the Macro within the file using an OLE model, but the text of a Macro appears jumbled. Heavily jumbled. I was wondering if there is some straightforward way to get this information out of the files.

    Thank you,
    Lealdo

Answers

  • Friday, July 17, 2009 5:34 PMArt Leonard - MSFT Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi, Lealdo.

    The answer you are looking the run-length Compression detailed in section 2.4 of [MS-OVBA]. http://msdn.microsoft.com/en-us/library/dd923471.aspx It is not a standard "zip" format. This compression scheme is used for all of the Module streams in the project.

    Please note that the compressed part does not start necessarily at the beginning of the stream. You will need to read the DIRs stream to get the module offset (http://msdn.microsoft.com/en-us/library/dd922360.aspx) for the particular stream before you can decompress it. The offset will essentially tell you where to start reading.

    Also note that in order to do what you are doing, in order to get a complete representation of the VBA project, you will need most of the information in the streams documented (with the exception of performance caches) and any forms (along with the VBAFrame stream).

    If you're just trying to get the code, then you just need to get the modules, however you could not reconsisitute the project with just that information.

    Please let me know if you have more questions. Thanks!
    VSTO Development

All Replies

  • Tuesday, June 30, 2009 3:23 PMHongwei Sun-MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Lealdo,

         Thanks for your question.  One of our team members will work on your question and get back to you soon.

    Thanks!
    Hongwei Sun -MSFT
  • Wednesday, July 01, 2009 3:45 PMDominic Salemno MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Lealdo,

     

    Upon reviewing MS-DOC (http://msdn.microsoft.com/en-us/library/cc313153.aspx) at Section 2.1.9, you will see that if any macros exist, they will be stored in the "Project Root Storage" as defined in MS-OVBA (http://msdn.microsoft.com/en-us/library/cc313094.aspx).

     

    Parsing the binary file formats is not a trivial task. All office binary file formats are commonly referred to as OLE Structured Storage or Compound File Binary File Format (MS-CFB: http://msdn.microsoft.com/en-us/library/cc546605.aspx). They mimic a FAT filesystem with sectors. Hence, fragmentation of an office binary file is not only possible, but very likely. Therefore, one would have to parse the underlying CFB Header and Structures before the DOC or XLS format could be parsed. Byte-by-byte would not be feasible unless defragmentation of the file occured.

    Dominic Salemno
    Senior Support Escalation Engineer

  • Thursday, July 02, 2009 6:32 PMLealdo Delucci Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thank you, I already have an implementation of an OLE reader. The problem I am having is once I have the OLE stream pertaining to the macro I am looking for, it appears as if the macro is partially compressed. I was wondering if you could shed some light on this.

    For example, the following macro code:
    Sub default()
       Dim sMyDir As String
       Dim sDocName As String
       ' The path to obtain the files.
       sMyDir = "C:\Documents and Settings\ldelucci\My Documents\"
       sDocName = Dir(sMyDir & "*.DOC")
       While sDocName <> ""
          ' Print the file.
          Application.PrintOut FileName:=sMyDir & sDocName
          ' Get next file name.
          sDocName = Dir()
       Wend
    End Sub


    Compiles into a bunch of text that includes the following:
    Sub def ault()
        Dim s MyDir As$ S ~ng0DocŒ4' The  path to  obtain t $files.G>S C:\ Fu ments an d Settin gs\ldelucci\M´y \ ¡ ’s ¼ –(I& "* .DOC"·Wh… k *<> "=  ' PrintIƒH Appli€cation.`Out F $7:=†4+‚d(Get@ next q ®n ­*†A= ¦r«€Wend
    EvÐub


    Which looks like it could possibly be useful, but isnt directly useful at all.

    Thank you,
    Lealdo
  • Tuesday, July 07, 2009 4:49 PMDominic Salemno MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Lealdo,

    I have received your information and am currently performing an investigation into this matter. I will update you as things progress.

    Dominic Salemno
    Senior Support Escalation Engineer
  • Wednesday, July 08, 2009 4:25 PMDominic Salemno MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Lealdo,

    Could you send me a sample file containing a macro, in which you see this happening, to dochelp@microsoft.com?

    Dominic Salemno
    Senior Support Escalation Engineer
  • Thursday, July 09, 2009 8:34 PMLealdo Delucci Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I have sent you the example file I used above.

    Thank you,
    Lealdo
  • Friday, July 10, 2009 5:15 PMDominic Salemno MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Lealdo,

    I have received this file and I am investigating this matter.

    Thank you.

    Dominic Salemno
    Senior Support Escalation Engineer

  • Thursday, July 16, 2009 2:23 PMLealdo Delucci Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thank you,

    I found the answer I was looking for. The data is compressed within the OLE stream using a zip compression similar to LZ compression, referred to as MSZip. The libgsf project contains a module to do almost exactly this task, which can be converted into other languages and relies on no libraries. libgsf is open source.

    Thank you again,
    Lealdo
  • Friday, July 17, 2009 5:34 PMArt Leonard - MSFT Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi, Lealdo.

    The answer you are looking the run-length Compression detailed in section 2.4 of [MS-OVBA]. http://msdn.microsoft.com/en-us/library/dd923471.aspx It is not a standard "zip" format. This compression scheme is used for all of the Module streams in the project.

    Please note that the compressed part does not start necessarily at the beginning of the stream. You will need to read the DIRs stream to get the module offset (http://msdn.microsoft.com/en-us/library/dd922360.aspx) for the particular stream before you can decompress it. The offset will essentially tell you where to start reading.

    Also note that in order to do what you are doing, in order to get a complete representation of the VBA project, you will need most of the information in the streams documented (with the exception of performance caches) and any forms (along with the VBAFrame stream).

    If you're just trying to get the code, then you just need to get the modules, however you could not reconsisitute the project with just that information.

    Please let me know if you have more questions. Thanks!
    VSTO Development