Ask a questionAsk a question
 

AnswerMicrosoft Extensions to TIFF?

  • Friday, October 10, 2008 9:45 AMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    This is a wishlist for a new spec, rather than a query on the existing specs.

    Microsoft products sometimes put extra tags into TIFF documents, and of course older Microsoft Office versions produced MDI (which I think of as a TIFF variant) using the Microsoft Office Document Imaging Writer virtual printer.

    Some initial investigations indicate three kinds of compression:
    There are new kinds of image compression:
     - MODI_BLC  34718
     - MODI_PTC  34720
     - MODI_VECTOR   34719

    MODI_VECTOR appears to basically be Enhanced Metafile.

    MODI_BLC and MODI_PTC are not understood.

    In addition to the compression methods, there are unknown tags (fields). These unknown properties appear to occur in both TIFF and MDI files.

    37679 - appears on every page, looks like the text version of the document contents. The content are 0x01 0x00, followed by a length (4 byte aka long) which is 6 bytes less than the actual length of this field (i.e. it is the remaining length), followed by the UTF8 text version. Each phrase is delimited by a space followed by a newline (0x20 0x0a aka ' \n'). The end is 0x0d 0x00.

    37680 - only appears to occur on the first page, always appears to be length 4096, always starts with 0xd0 0xcf 0x11 0xe0 0xa1 0xb1 0x1a 0xe1, then a string of zeros, and then varies. Perhaps some kind of metadata dictionary? It is located at the end of the file, and there are 16-bit wide characters that look like "Root Entry", "CONTENTS" (sometimes more than once, even if only one page), "prop2" (sometimes more than once), "prop3" (somtimes more than once), "DICT", "Summary Information", "Owner" and some names. There might be some random stuff / fill in there too. Also appears to be a consistent bit of stuff "AuvsxjatP0udlw1Aaq5eubr5h" (this
    might not be ASCII though - there is a 0x05 0x00 always on the front of it.

    37681 - appears on every page, always stars with 0x02 0x00 (+ 0x00, 0x00?), then varies. Possibly the thumbnail image?

    Would it be possible to get some clarification / confirmation on the compression methods and unknown tags (including any additional tags not yet found)? A spec would be idea, but given that MDI isn't so common and the preference to move to XPS, perhaps just some notes here?


Answers

  • Friday, February 27, 2009 2:50 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi Brad,

    I apologize for the delay.

     

    I can now confirm the Product Group will document the TIFF tags, and the expected timeframe for the documentation is the end of August.

     

    Regards,

    Mark Miller

    Escalation Engineer

    US-CSS DSC PROTOCOL TEAM

All Replies

  • Friday, October 10, 2008 2:01 PMSebastian CanevariMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brad,

    Thanks for your post.

    We'll let you know as soon as we have news or questions.

    Regards,

    SEBASTIAN CANEVARI - MSFT SEE Protocol Documentation Team
  • Thursday, October 30, 2008 3:47 PMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Sebastian,

    Is there any news, or even a timeframe for this?

    Brad
  • Thursday, November 20, 2008 8:45 AMSteve Smegner Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Brad,
    Could you describe your use case scenario?

    Steve Smegner

    Application Development Consulting Group

  • Thursday, November 20, 2008 10:24 AMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Steve,

    The idea is to provide better support for TIFF and MDI files produced by Microsoft Office Document Imaging on other platforms. I'm particularly interested in Okular.

    Right now, we have TIFF support, and the various pages display fine. That is done using libtiff (http://www.remotesensing.org/libtiff/). I'd like to provide the users with whatever support we can (just as for the .snp case).

    There are essentially two aspects to this:

    1. Support for the microsoft-unique TIFF tags (37679,37680 and 37681 are the ones I know of). I do have initial support for the text extraction part (just implemented - see http://websvn.kde.org/?view=rev&revision=886464 for the actual code changes), but not for the other two tags.

    2. Display of MDI files in the same way we currently display TIFF files. That requires knowledge about the three MDI-specific codecs (per my original request).

    The bigger concept here is that given that TIFF is an industry standard format, I'd like to see Microsoft document its extensions to that format.
  • Friday, January 09, 2009 4:26 PMSteve Smegner Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Greetings Brad,

    I wanted to let you know that we have not forgotten this request. Due to the holidays and the deprecated nature of the MDI formats we are still tracking down the nature of these compression tags. My sources are back from vacation and the holidays and I hope to have an update for you very soon. Thanks for your patience.

    Steve Smegner
    Application Development Consulting Group

  • Friday, January 09, 2009 9:58 PMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Steve,

    Thanks for the continuing work on this, and for the status update.

    Much appreciated.

    Brad
  • Thursday, February 05, 2009 3:43 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Proposed Answer
    Hi Brad,

    I am on the Open Specification Protocols Documentation team, and have taken ownership of this issue.  I have followed this through to conclusion where Steve left off with our Product Group.

    We do not have standalone documentation of the MDI file format and don’t currently have plans to create any since the format is considered obsolete and we no longer recommend using it.  You may want to review this page: http://office.microsoft.com/en-us/help/HP062193601033.aspx.  Saving files in the TIFF format would be the more portable option.

    Regards,
    Mark Miller
    Escalation Engineer
    US-CSS DSC PROTOCOL TEAM
  • Friday, February 06, 2009 12:48 AMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Mark,

    I appreciate that MDI is obsolete, however (as you point out) TIFF is not. My original wishlist concerned both MDI and TIFF, which might have confused things. So lets exclude MDI, and only deal with TIFF files as produced by contemporary Microsoft applications.

    There are private tags (fields) in TIFF files produced by those tools, as noted in my original request:
    37679, 37680, 37681.

    Is documentation of those tags available under the Interoperability Principles? I can understand that they may not be (given that they are explicitly private tags), I'd just prefer not to have to figure them out using a binary editor...

    Brad
  • Friday, February 27, 2009 2:50 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi Brad,

    I apologize for the delay.

     

    I can now confirm the Product Group will document the TIFF tags, and the expected timeframe for the documentation is the end of August.

     

    Regards,

    Mark Miller

    Escalation Engineer

    US-CSS DSC PROTOCOL TEAM

  • Monday, August 31, 2009 4:52 PMPhil Harvey Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    It is now the end of August.

    Please tell me where I can find this documentation.

    Thanks!

  • Wednesday, September 09, 2009 2:22 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hello Phil,

     

    I checked on the status of the documents with our Product Group and they are not yet ready.  I apologize for the delay.  The documentation for the TIFF tags turned out to be much more involved and complex than expected.  The Product Group informs me that the documentation should be ready by the end of the year.

     

    Having said this, if you can provide more specifics on what you are trying to accomplish or need for TIFF tag details we may be able to assist you in the interim.

     

    Regards,
    Mark Miller
    Escalation Engineer
    US-CSS DSC PROTOCOL TEAM

     

  • Thursday, September 10, 2009 1:54 PMPhil Harvey Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Mark,

    Thanks for the offer, but what I really need is the documentation so I can add support for this TIFF information to my metadata extraction utility.  I am particularly interested in the details of tag 37680 (0x9330) if indeed this is a "metadata dictionary".

    - Phil

  • Saturday, September 26, 2009 6:31 AMBrad Hards Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Mark,

    Can you advise which private TIFF tags are used in Microsoft products (by number, and if possible, the name of the tag)?

    Can you confirm that 37679 (if present) is always the text version of the page content, per my original post?

    Can you advise whether 37680 is some kind of metadata dictionary? I recognise that the documentation for the tag may not yet be available.

    Can you advise whether 37681 is some kind of thumbnail? I recognise that the documentation for the tag may not yet be available.

    Brad
  • Saturday, September 26, 2009 2:47 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brad,

    I'll research this and respond asap.

    Regards,
    Mark Miller
    Escalation Engineer
    US-CSS DSC PROTOCOL TEAM
  • Friday, October 02, 2009 6:02 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brad,

    The Product Group is addressing your request for details of these TIFF tags and hopefully I will have that information for you soon.
     

    Regards,

    Mark Miller

    Escalation Engineer

    US-CSS DSC PROTOCOL TEAM

  • Friday, October 16, 2009 3:39 AMDominic Salemno MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Brad,

    We are still investigating this inquiry.

    Dominic Salemno
    Senior Support Escalation Engineer
  • Thursday, November 05, 2009 10:23 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brad,

    The product group is still working on your request, and I will respond as soon as they do.

    Regards,
    Mark Miller
    Escalation Engineer
    US-CSS DSC PROTOCOL TEAM
  • Monday, November 16, 2009 7:11 PMMark Miller_DSCMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brad,

    I have information for you regarding your forum post on Saturday, September 26, 2009.

    Can you please send me an Email Address that will allow me to send you files?

    Regards,

    Mark Miller

    Escalation Engineer

    US-CSS DSC PROTOCOL TEAM