Microsoft Extensions to TIFF?
- This is a wishlist for a new spec, rather than a query on the existing specs.
Microsoft products sometimes put extra tags into TIFF documents, and of course older Microsoft Office versions produced MDI (which I think of as a TIFF variant) using the Microsoft Office Document Imaging Writer virtual printer.
Some initial investigations indicate three kinds of compression:
There are new kinds of image compression:
- MODI_BLC 34718
- MODI_PTC 34720
- MODI_VECTOR 34719
MODI_VECTOR appears to basically be Enhanced Metafile.
MODI_BLC and MODI_PTC are not understood.
In addition to the compression methods, there are unknown tags (fields). These unknown properties appear to occur in both TIFF and MDI files.
37679 - appears on every page, looks like the text version of the document contents. The content are 0x01 0x00, followed by a length (4 byte aka long) which is 6 bytes less than the actual length of this field (i.e. it is the remaining length), followed by the UTF8 text version. Each phrase is delimited by a space followed by a newline (0x20 0x0a aka ' \n'). The end is 0x0d 0x00.
37680 - only appears to occur on the first page, always appears to be length 4096, always starts with 0xd0 0xcf 0x11 0xe0 0xa1 0xb1 0x1a 0xe1, then a string of zeros, and then varies. Perhaps some kind of metadata dictionary? It is located at the end of the file, and there are 16-bit wide characters that look like "Root Entry", "CONTENTS" (sometimes more than once, even if only one page), "prop2" (sometimes more than once), "prop3" (somtimes more than once), "DICT", "Summary Information", "Owner" and some names. There might be some random stuff / fill in there too. Also appears to be a consistent bit of stuff "AuvsxjatP0udlw1Aaq5eubr5h" (this
might not be ASCII though - there is a 0x05 0x00 always on the front of it.
37681 - appears on every page, always stars with 0x02 0x00 (+ 0x00, 0x00?), then varies. Possibly the thumbnail image?
Would it be possible to get some clarification / confirmation on the compression methods and unknown tags (including any additional tags not yet found)? A spec would be idea, but given that MDI isn't so common and the preference to move to XPS, perhaps just some notes here?
- Changed TypeSteve Smegner Wednesday, November 19, 2008 4:43 AM
Answers
Hi Brad,
I apologize for the delay.
I can now confirm the Product Group will document the TIFF tags, and the expected timeframe for the documentation is the end of August.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM
- Marked As Answer byMark Miller_DSCMSFT, ModeratorFriday, February 27, 2009 2:50 PM
- Proposed As Answer byMark Miller_DSCMSFT, ModeratorFriday, February 27, 2009 2:50 PM
All Replies
- Hi Brad,
Thanks for your post.
We'll let you know as soon as we have news or questions.
Regards,
SEBASTIAN CANEVARI - MSFT SEE Protocol Documentation Team - Sebastian,
Is there any news, or even a timeframe for this?
Brad
Brad,
Could you describe your use case scenario?Steve Smegner
Application Development Consulting Group
- Hi Steve,
The idea is to provide better support for TIFF and MDI files produced by Microsoft Office Document Imaging on other platforms. I'm particularly interested in Okular.
Right now, we have TIFF support, and the various pages display fine. That is done using libtiff (http://www.remotesensing.org/libtiff/). I'd like to provide the users with whatever support we can (just as for the .snp case).
There are essentially two aspects to this:
1. Support for the microsoft-unique TIFF tags (37679,37680 and 37681 are the ones I know of). I do have initial support for the text extraction part (just implemented - see http://websvn.kde.org/?view=rev&revision=886464 for the actual code changes), but not for the other two tags.
2. Display of MDI files in the same way we currently display TIFF files. That requires knowledge about the three MDI-specific codecs (per my original request).
The bigger concept here is that given that TIFF is an industry standard format, I'd like to see Microsoft document its extensions to that format.
- Greetings Brad,
I wanted to let you know that we have not forgotten this request. Due to the holidays and the deprecated nature of the MDI formats we are still tracking down the nature of these compression tags. My sources are back from vacation and the holidays and I hope to have an update for you very soon. Thanks for your patience.
Steve Smegner
Application Development Consulting Group - Steve,
Thanks for the continuing work on this, and for the status update.
Much appreciated.
Brad
- Hi Brad,
I am on the Open Specification Protocols Documentation team, and have taken ownership of this issue. I have followed this through to conclusion where Steve left off with our Product Group.
We do not have standalone documentation of the MDI file format and don’t currently have plans to create any since the format is considered obsolete and we no longer recommend using it. You may want to review this page: http://office.microsoft.com/en-us/help/HP062193601033.aspx. Saving files in the TIFF format would be the more portable option.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM- Unmarked As Answer byBrad Hards Friday, February 06, 2009 12:48 AM
- Proposed As Answer byMark Miller_DSCMSFT, ModeratorThursday, February 05, 2009 3:43 PM
- Marked As Answer byChris MullaneyMSFT, OwnerThursday, February 05, 2009 5:41 PM
- Hi Mark,
I appreciate that MDI is obsolete, however (as you point out) TIFF is not. My original wishlist concerned both MDI and TIFF, which might have confused things. So lets exclude MDI, and only deal with TIFF files as produced by contemporary Microsoft applications.
There are private tags (fields) in TIFF files produced by those tools, as noted in my original request:
37679, 37680, 37681.
Is documentation of those tags available under the Interoperability Principles? I can understand that they may not be (given that they are explicitly private tags), I'd just prefer not to have to figure them out using a binary editor...
Brad Hi Brad,
I apologize for the delay.
I can now confirm the Product Group will document the TIFF tags, and the expected timeframe for the documentation is the end of August.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM
- Marked As Answer byMark Miller_DSCMSFT, ModeratorFriday, February 27, 2009 2:50 PM
- Proposed As Answer byMark Miller_DSCMSFT, ModeratorFriday, February 27, 2009 2:50 PM
- It is now the end of August.Please tell me where I can find this documentation.Thanks!
Hello Phil,
I checked on the status of the documents with our Product Group and they are not yet ready. I apologize for the delay. The documentation for the TIFF tags turned out to be much more involved and complex than expected. The Product Group informs me that the documentation should be ready by the end of the year.
Having said this, if you can provide more specifics on what you are trying to accomplish or need for TIFF tag details we may be able to assist you in the interim.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM- Hi Mark,Thanks for the offer, but what I really need is the documentation so I can add support for this TIFF information to my metadata extraction utility. I am particularly interested in the details of tag 37680 (0x9330) if indeed this is a "metadata dictionary".- Phil
- Mark,
Can you advise which private TIFF tags are used in Microsoft products (by number, and if possible, the name of the tag)?
Can you confirm that 37679 (if present) is always the text version of the page content, per my original post?
Can you advise whether 37680 is some kind of metadata dictionary? I recognise that the documentation for the tag may not yet be available.
Can you advise whether 37681 is some kind of thumbnail? I recognise that the documentation for the tag may not yet be available.
Brad - Hi Brad,
I'll research this and respond asap.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM - Hi Brad,
The Product Group is addressing your request for details of these TIFF tags and hopefully I will have that information for you soon.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM
- Brad,
We are still investigating this inquiry.
Dominic Salemno
Senior Support Escalation Engineer - Hi Brad,
The product group is still working on your request, and I will respond as soon as they do.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM - Hi Brad,
I have information for you regarding your forum post on Saturday, September 26, 2009.
Can you please send me an Email Address that will allow me to send you files?
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM


