Word binary format, non-complex document structure
-
13 februarie 2012 20:40
Hi all,
I have some doubts regarding the structure of a non-complex file (fComplex in the FIB Base is equal to '0').
I have been able to locate the text, which starts at the position indicated by fib.fcMin (offset 0x18). I think the specification says that this is a reserved field. Anyway, in the following sector after the text, there is a list of FKPs (for characters, paragraphs, etc).
My question is where can I retrieve the information about the list of available FKPs (for characters, paragraphs, tables, etc.) that modify the text. For example, hay can I know that in the sector #n there is a CHPX FKP and in the following sector #n+1 there is an PAPX FKP.
It seems that for character an paragraphs, this information can be obtained from the table stream at the offsets indicated by fcPlcfBtePapx and fcPlcBteChpx but I'm not sure about it. What about other objects in the document (e.g. tables)?
Thanks in advance.
Toate mesajele
-
13 februarie 2012 21:54Moderator
Hi,
Thank you for your question. One of our engineers will look into this and follow-up with you soon.
Regards,
Edgar
-
13 februarie 2012 22:55
Thanks. -
17 februarie 2012 17:19Moderator
Hi Nevermind82,
If I understand correctly, and in case you haven’t seen this, the [MS-DOC]Binary File Format specification appears to have answers to your questions.
For example, Section2.4 Document Content, specifies the algorithms used to analyze document content, such as Section2.4.1 Retrieving Text, Section2.4.3 Overview of Tables, etc.
Please let me know if that does not guide you to the answers to your questions.
Regards,
Mark Miller
Escalation Engineer
US-CSS DSC PROTOCOL TEAM- Editat de Mark Miller_DSCMicrosoft Employee, Moderator 17 februarie 2012 17:20
- Marcat ca răspuns de Mark Miller_DSCMicrosoft Employee, Moderator 16 mai 2012 19:29