Export macro from office file
- Hi, I am trying to write a program that will recognize whether or not an office document (.doc, .xls, etc) has a macro in it, and if it does, to export that to a separate document. I need to do this in a platform independent way which means I cannot use VB. (I am developing in Python).
What I have found is a way to find the Macro within the file using an OLE model, but the text of a Macro appears jumbled. Heavily jumbled. I was wondering if there is some straightforward way to get this information out of the files.
Thank you,
Lealdo
Answers
Hi, Lealdo.
The answer you are looking the run-length Compression detailed in section 2.4 of [MS-OVBA]. http://msdn.microsoft.com/en-us/library/dd923471.aspx It is not a standard "zip" format. This compression scheme is used for all of the Module streams in the project.
Please note that the compressed part does not start necessarily at the beginning of the stream. You will need to read the DIRs stream to get the module offset (http://msdn.microsoft.com/en-us/library/dd922360.aspx) for the particular stream before you can decompress it. The offset will essentially tell you where to start reading.
Also note that in order to do what you are doing, in order to get a complete representation of the VBA project, you will need most of the information in the streams documented (with the exception of performance caches) and any forms (along with the VBAFrame stream).
If you're just trying to get the code, then you just need to get the modules, however you could not reconsisitute the project with just that information.
Please let me know if you have more questions. Thanks!
VSTO Development- Proposed As Answer byArt Leonard - MSFT Friday, July 17, 2009 5:34 PM
- Marked As Answer byChris MullaneyMSFT, OwnerWednesday, July 29, 2009 8:15 PM
All Replies
- Lealdo,
Thanks for your question. One of our team members will work on your question and get back to you soon.
Thanks!
Hongwei Sun -MSFT Lealdo,
Upon reviewing MS-DOC (http://msdn.microsoft.com/en-us/library/cc313153.aspx) at Section 2.1.9, you will see that if any macros exist, they will be stored in the "Project Root Storage" as defined in MS-OVBA (http://msdn.microsoft.com/en-us/library/cc313094.aspx).
Parsing the binary file formats is not a trivial task. All office binary file formats are commonly referred to as OLE Structured Storage or Compound File Binary File Format (MS-CFB: http://msdn.microsoft.com/en-us/library/cc546605.aspx). They mimic a FAT filesystem with sectors. Hence, fragmentation of an office binary file is not only possible, but very likely. Therefore, one would have to parse the underlying CFB Header and Structures before the DOC or XLS format could be parsed. Byte-by-byte would not be feasible unless defragmentation of the file occured.
Dominic Salemno
Senior Support Escalation Engineer- Proposed As Answer byDominic Salemno MSFTMSFT, ModeratorWednesday, July 01, 2009 3:45 PM
- Unproposed As Answer byLealdo Delucci Thursday, July 02, 2009 6:32 PM
- Unproposed As Answer byLealdo Delucci Thursday, July 02, 2009 6:32 PM
- Thank you, I already have an implementation of an OLE reader. The problem I am having is once I have the OLE stream pertaining to the macro I am looking for, it appears as if the macro is partially compressed. I was wondering if you could shed some light on this.
For example, the following macro code:
Sub default()
Dim sMyDir As String
Dim sDocName As String
' The path to obtain the files.
sMyDir = "C:\Documents and Settings\ldelucci\My Documents\"
sDocName = Dir(sMyDir & "*.DOC")
While sDocName <> ""
' Print the file.
Application.PrintOut FileName:=sMyDir & sDocName
' Get next file name.
sDocName = Dir()
Wend
End Sub
Compiles into a bunch of text that includes the following:
Sub def ault()
Dim s MyDir As$ S ~ng0DocŒ4' The path to obtain t $files.G>S C:\ Fu ments an d Settin gs\ldelucci\M´y \ ¡ ’s ¼ –(I& "* .DOC"·Wh… k *<> "= ' PrintIƒH Appli€cation.`Out F $7:=†4+‚d(Get@ next q ®n *†A= ¦r«€Wend
EvÐub
Which looks like it could possibly be useful, but isnt directly useful at all.
Thank you,
Lealdo - Lealdo,
I have received your information and am currently performing an investigation into this matter. I will update you as things progress.
Dominic Salemno
Senior Support Escalation Engineer - Lealdo,
Could you send me a sample file containing a macro, in which you see this happening, to dochelp@microsoft.com?
Dominic Salemno
Senior Support Escalation Engineer - I have sent you the example file I used above.
Thank you,
Lealdo Lealdo,
I have received this file and I am investigating this matter.
Thank you.
Dominic Salemno
Senior Support Escalation Engineer- Thank you,
I found the answer I was looking for. The data is compressed within the OLE stream using a zip compression similar to LZ compression, referred to as MSZip. The libgsf project contains a module to do almost exactly this task, which can be converted into other languages and relies on no libraries. libgsf is open source.
Thank you again,
Lealdo
Hi, Lealdo.
The answer you are looking the run-length Compression detailed in section 2.4 of [MS-OVBA]. http://msdn.microsoft.com/en-us/library/dd923471.aspx It is not a standard "zip" format. This compression scheme is used for all of the Module streams in the project.
Please note that the compressed part does not start necessarily at the beginning of the stream. You will need to read the DIRs stream to get the module offset (http://msdn.microsoft.com/en-us/library/dd922360.aspx) for the particular stream before you can decompress it. The offset will essentially tell you where to start reading.
Also note that in order to do what you are doing, in order to get a complete representation of the VBA project, you will need most of the information in the streams documented (with the exception of performance caches) and any forms (along with the VBAFrame stream).
If you're just trying to get the code, then you just need to get the modules, however you could not reconsisitute the project with just that information.
Please let me know if you have more questions. Thanks!
VSTO Development- Proposed As Answer byArt Leonard - MSFT Friday, July 17, 2009 5:34 PM
- Marked As Answer byChris MullaneyMSFT, OwnerWednesday, July 29, 2009 8:15 PM


