none
Reading Viso VSD-Files and Publisher PUB-Files without office

    Question

  • Hello!

    I need to know how to parse Visio VSD and Publisher PUB files.

    I know that those files are structured storage files.

    I am particulary interested in the Contents-Stream for PUB files and the VisioDoument-Stream for VSD files.

    I need to extract and/or insert digital signatures for vba-code into those streams. Is there anyone who knows how those streams are built?

    As for the Contents-Stream in Publisher files (most recent version) I found out, that this stream is built in sequential blocks but i don't know the meaning of each block.

    Any help would be appreciated.

    Regards
    Alex

    Friday, October 07, 2016 8:08 AM

All replies

  • Hi,

    According to VSDX: the new Visio file format, the primary Visio Drawing (VSD) file format was binary, it was difficult for third parties to access and extract data from the binary format for use outside of Visio.

    I suggest you convert in to .vsdx, then use Open Packaging Conventions and XML to parse. You could visit How to: Manipulate the Visio file format programmatically. It demonstrate how to read/select/change and add parts of visio packages.

    Besides, could you please share with us what are the sequential blocks for the .pub?  

    Thanks for your understanding.

    Regards,

    Celeste

    Monday, October 10, 2016 7:15 AM
  • Hi Celeste!

    Parsing VSDM-files is a no brainer and already implemented. The binary format is the hard one. ;-)

    I've read some Libre Office Code.

    This code first identifies the version of the Pub-file (first 4 Bytes of the content stream).

    Then they start to parse the content stream at Position 0x1a. There is an offset (4-bytes) they call traileroffset.

    I found out that the next 4 Bytes (starting at 0x1e) contain the offset to something I call an outer block.

    Every outer block consists of the block-length and the block-data (block-lenght -4 bytes long).

    So you can read those blocks until you reach the trailer-block defined by the traileroffset.

    Some of those outer-blocks can contain inner blocks (the trailer-block does and so does the block containing the digital signature).

    An inner block consists of 1-byte blockID, 1-byte block-type. The block-type defines if the block has fixed or variable length. If the inner block has a variable length the next 4 bytes are the length, followed by the data (length-4).

    So now I can determine if there is a digital VBA signature in the content-stream (by searching for an outer block that has at least one inner block with the id 2, the type 0x80 and a DigSigBlob-structure as described in MS-OSHARED.

    Some of the first bytes of the content stream also contain the total length of the stream.

    I tried to manipulate the stream extracting the outer block with the signature, changing the offset at 0x1a and the positions stream lengths in order to remove the signature, but that just lead to a corrupt Pub-file.

    Regards

    Alex

    Wednesday, October 12, 2016 2:30 PM
  • Hi,

    Thanks for you sharing about the content stream of .pub

    Since your issue is about binary file format, I would move this thread into the following forum

    https://social.msdn.microsoft.com/Forums/office/en-US/home?forum=os_binaryfile

    I think you could receive better suggestion there.

    Sorry for any inconvenience and have a nice day! 

    Regards,

    Celeste

    Thursday, October 13, 2016 2:20 PM
  • Hello Celeste: 

    Visio and Pub formats are not covered by the Open Specifications supported by this very forum. Alex is already aware of this as we suggested him earlier to try Visio forum to get further help at this thread: https://social.msdn.microsoft.com/Forums/en-US/98df5ba7-244a-42ff-acb3-be1b70d44ea5/vba-project-signature-sig-vs-sigagile?forum=os_binaryfile

    Thanks

      

    Tarun Chopra | Escalation Engineer | Open Specifications Support Team

    Thursday, October 13, 2016 6:29 PM
  • Hi,

    In fact, Microsoft Office for Developer forum is used for discussing the issues about Office object model and also the issues about Open XML SDK.

    Sorry that we could not give better suggestions about the issues with Open Packaging Conventions API and binary file formats.

    Thanks for your understanding.

    Regards,

    Celeste

    Friday, October 14, 2016 7:06 AM