locked
how to read references in excel workbook binary xls file without using excel to open it RRS feed

  • Question

  • hi,

    If I open Excel VBE window, Clicks the Tools menu in the main menu in excel, then click References in the Tools menu, A References window will show up and show me what references are for the current workbook.

    I would like to do something similar, but getting the reference list and full paths by reading the excel binary workbook without using excel to open the workbook and read from excel.

    The reason I wanted to do this is that I have a few xls workbooks, which reference multiple .xla's and dll's. These references could have multiple copies in the computer, When I launch a xls workbook, excel has loaded references from different locations. I am just curious to see what's the excel searching order for xla references.

    If I can read these info from the binary workbook, I would also want to write a new path of a reference to the xls so that it would load references from the location I specified.

    NR1234

    Friday, May 11, 2012 8:45 PM

All replies

  • I don't think that is possible with a .xls file. You'd need to open the workbook in Excel to get at its VBProject.References collection.

    Regards, Hans Vogelaar

    Friday, May 11, 2012 8:51 PM
  • This is not for the faint hearted!
     
     
    Workbooks that have VBA projects have a structure in the file, which can be seen pictorially here: http://msdn.microsoft.com/en-us/library/dd904787(v=office.12)
     
    The references are held in the dir stream, documented here: http://msdn.microsoft.com/en-us/library/dd906362(v=office.12) and in subordinate sections
     

    Enjoy,
    Tony
    www.WordArticles.com
    Monday, May 14, 2012 9:21 AM
  • Thanks Tony.

    This seems to be what I am looking for. Is there any code example for doing this so I don't have to start from scratch? I am using excel 2003 by the way.


    NR1234

    Monday, May 14, 2012 9:48 PM
  • Note that Excel 2010 files are a totally different format, so all your work will have a limited life span.

    Rod Gill

    The one and only Project VBA Book

    Rod Gill Project Management

    Tuesday, May 15, 2012 4:29 AM
  • This should probably be a web page, not a post, but here goes ...
     
    I don't have code but you have limited requirements, and office files do not exploit the full capacity of compound files, so, the simple version, ...
     
    Compound files are, logically, made up of 'Storages' (analagous to folders) and 'Streams' (analagous to files). You should only be interested in Streams.
     
    Compound files are, physically, made up of 512-byte sectors, and 64-byte short sectors. First, the (512-byte) sectors:
     
    The physical arrangement of the sectors is arbitrary but they are logically arranged in chains and for each sector there is a 'pointer' (actually a next sector number) to the one, if any, that logically follows it. These 'pointers', themselves, are stored as a 'sector allocation table', itself in a chain of sectors in the file, but the arrangement of the chaining is slightly different - there is an array of pointers to consecutive sectors (the allocation table), and a 'master array' of pointers to the sectors of the allocation table.
     
    The master sector allocation table begins at offset 76 (0x4c) in the file, and continues to offset 0x200 (512), padded, if not fully used, with binary ones (0xFF). There is a mechanism for dealing with an overflow, but, unless you have very large workbooks, you won't need to worry about it. The master sector allocation table is a list of sector numbers of the sectors which contain the actual sector allocations.
     
    'Pointers' to sectors are, as noted above, actually sequential sector numbers. Sector 0 begins at offset 512 (0x200) within the file, sector 1 at offset 0x400, etc; sector n, thus, starts at 512*(n+1).
     
    Opening an arbitrary Word document (Excel workbooks are the same in this respect), I look at offset 76 and find "2d 00 00 00". In little endian format this is number 0x0000002d, or 45 in decimal. This sector starts (in my example) at offset 512*(45+1) = 23552 = 0x5c00. At this address you will find 'next sector numbers' for sector numbers 0 through 127. At offset 80, I find "6c 00 00 00" (sector number 108); at offset 512*(108+1) (=0xDA00) I then find 'next sector numbers' for sectors 128 through 255. And so on.
     
    Now you can follow sectors, you can start looking at the contents! At offset 48 (0x30) within the file is the first sector number of the file directory. In my arbitrary Word document, I find "2e 00 00 00" (46 in decimal). Sector 46 is at offset 512*(46+1) = 24064 = 0x5e00. This happens to be immediately after the first sector of the sector allocation table but that is just coincidence.
     
    Directory entries are 128 bytes each, so four fit in each sector. When I look in the sector allocation table for sector 46 (that is at offset 4*46 within said table) I happen to find "34 00 00 00" (sector 52, at offset 512*(52+1)), and in that sector I find the next 4 directory entries. And so on.
     
    Next: small sectors. Any Streams shorter than 4096 bytes (which the "dir" stream will usually be) are stored in small sectors. All the small sectors together are stored as a single stream in ordinary sectors. Small sectors within the Small Sector Stream are organised in exactly the same way as normal sectors, that is there is an allocation table giving the next small sector number for each small sector. This allocation table is contained within normal sectors, the first of which is referenced by the (normal) sector number at offset 60 (0x3c) in the file. If there are more than 128 small sectors in total then the small sector allocation table will extend into further normal sectors, chained as descibed above.
     
    Now back to Directory entries!
     
    Directory entries begin with 64 bytes of a null-terminated UCS-2 format, stream (or storage) name, a maximum, thus, of a 31-character name, followed immediately by a word giving the number of bytes actually occupied by the name and the null terminator.
     
    The first directory entry is for a Stream called "Root Entry" and this represents the Short Sector Stream. The name is followed by some bits and bytes you shouldn't be interested in, and at offset 116 (0x74) within the entry is the sector number of the first sector of the stream. The next 4 bytes give the total length of the stream, so you know how much data there is. You will need to remember these to use later if, as I expect, your "dir" stream is in the short sectors.
     
    Having saved the details of the "Root Entry" you must now look forward through the directory to find the entry for the "dir" stream, possibly several sectors forward. When you find it you can check the length (at offset 120 (0x78) in the entry) to find out whether it is less than 4096 and, assuming it is, you can grab the sector number from offset 116 (0x74), which will be the number of a small sector number.
     
    You know the (normal) sector where the small sector stream begins (from the Root Entry directory entry). Small sector 24 (0x18), which is what I find in my sample document, is at offset 24*64 = 1536 (0x600) within the small sector stream; this will be at the start of the fourth normal sector of the small sector stream, which, in my sample document is at offset 0x30000 within the file - yours will be somewhere else, no doubt! You can read just 64 bytes from here (the length of a single small sector) and must then go to the small sector allocation table to find the number of the small sector from which to read the next 64 bytes, etc.
     
    Now the fun really begins. The "dir" stream is compressed and must be uncompressed before you go any further. This is described at http://msdn.microsoft.com/en-us/library/dd923471(v=office.12) and I don't have time to go into more detail at the moment. If you get this far and need more help - or if I have not been clear (which is very likely as this stuff is confusing!) - come back for more details.
     
     

    Enjoy,
    Tony
    www.WordArticles.com
    • Proposed as answer by Quang Yên Monday, November 5, 2012 4:48 AM
    Tuesday, May 15, 2012 3:48 PM
  • Note that Excel 2010 files are a totally different format

    The VBA projects are still stored as compound binaries within the XML packages.
    Enjoy,
    Tony
     

    Enjoy,
    Tony
    www.WordArticles.com
    Thursday, May 17, 2012 10:37 AM
  • Sorry - I accidentally sent that before it was ready, but the message is there.
     

    Enjoy,
    Tony
    www.WordArticles.com
    Thursday, May 17, 2012 10:38 AM
  • Hi Tony,

    Thank you so much for the detail information. However, I feel very bad that I might not be able to try this out since I will have to re-evaluate my original plan and see if the implementation can be fitted into my current project schedule. If not, I will have to look for some other alternatives to solve my problem. I hope the information you provided here will help other people who look for the same solution. If I've ever decided to use this way, I will probably be back here to ask more questions and hopefully you or someone else could be here for help at that time.

    Thanks again Tony for the information. I appreciate it very much.


    NR1234

    Thursday, May 17, 2012 5:01 PM
  • Just for information ...
     
    I did say this was better suited to a web page, so I have written one, and provided some basic code. Currently it only addresses the Compound Binary File structure but it is my intent to write another page detailing the breakdown of some of the streams, particularly the VBA ones. I am still working on it and, if you do decide to go down this route, take a look and post any questions you have.
     
     
     

    Enjoy,
    Tony
    www.WordArticles.com
    Tuesday, May 29, 2012 5:53 PM