none
[MS-DOC] Problems determining the level number of a paragraph RRS feed

  • Question

  • I have a word document that consists of a number of paragraphs that are all part of a list.  There is one paragraph at level 1 and 9 more paragraphs at level 2.  Following the algorithm described in 2.4.6.4 Determining Level Number of a Paragraph I am having difficulty determining the correct number for two of the paragraphs in the document: paragraphs 6 and 7.

    First, the algorithm says that the inline paragraph attributes should be determined.  I have done this, but for several of the paragraphs in this document the inline attributes do not set sprmPIlvl or sprmPIlfo even though they are part of a list.  Instead appear to pull that information from the style that is applied to the paragraph.

    Second, the iLfo for the first paragraph is 0x0D and an iLvl of 1.  The paragraphs that follow have iLfo's of 2 and iLvl's of 2, except for paragraphs 6 and 7, which have an iLfo of 0xD and an iLvl of 2.  In Word, paragraphs 6 and 7 continue the numbering from paragraphs 2, 3, 4 and 5.  For paragraphs 6 and 7, lfolvl.fStartAt is 1, lfolvl.fFormatting is 0 and lfolvl.iStartAt is 1.  I am unclear on why the numbering isn't reset at paragraphs 6 and 7 as the iLfo is different than the previous paragraphs.  I have a sample document that exhibits the behavior.



    Tuesday, December 16, 2014 7:57 PM

Answers

  • Hi Steven,

    I've found that the problem you're seeing is likely a known issue in Word 2007 and probably hasn't been fixed. It is an apparent logic bug that causes an LFO set (lfolvl array) to be ignored when numbering the list.

    I have experimented with this some both in OOXML (.docx with compatibility preserved) and binary (.doc). They both show the same behaviour with your document. Ultimately, I will have to file a new problem report to get this addressed in the product but, the specification is correct. Word is just getting confused with this specific sample document's formatting.

    To work around this, I've found that you can actually force the restart on paragraphs 6 and 7 through the UI. When you do this, although the properties of the paragraph look similar, the lfo sprmPIlfo will change, pointing to a newly created LFOLVL array. This will use the same formatting in other respects but Word will also pick up the restart directive when numbering.

    This is easier to see when working with the Office Open XML format. Save the document as a .docx. You can check the box to preserve compatibility or not, it shouldn't matter. Then you notice the w:startOverride's in the corresponding numId 2 numbering section's w:lvlOverride block. When you force the restart via the UI, you can see that new numId sections are added and numId 2 isn't used.

    So, to summarize, I believe your binary parsing to be correct and that programmatically following the specification with respect to numbering this list would lead to the correct restarts in your own implementation. The fact that Word doesn't show it is most likely due to a logic error.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, January 15, 2015 12:19 AM
    Moderator

All replies

  • Hi Steven,

    Thank you for your question. A member of the Protocol Documentation support team will respond to you soon.

    Regards,
    Vilmos Foltenyi - MSFT

    Tuesday, December 16, 2014 10:25 PM
  • Hi Steven, I am the engineer who will be working with you on this issue. Would it be possible for you to send me the file to review? Please send it to dochelp(at)microsoft(dot)com to my attention and reference this forum thread.

    Thank you.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Wednesday, December 17, 2014 10:00 PM
    Moderator
  • Hi Josh,

    I have sent the requested document.  If you can please help me understand the numbering issue, that would be great.

    Essentially, the pattern looks like this:

    1
        (a)

        (b)

        (c)

        (d)

        (e)

    When I try to follow the algorithm I calculate it should be:

    1
        (a)

        (b)

        (c)

        (d)

        (a)

    Thursday, December 18, 2014 2:59 PM
  • Hi Steven, I received the file that you sent and am currently looking into it. I will let you know when I have more information about this issue.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, December 18, 2014 7:13 PM
    Moderator
  • Hi Steven, can you provide some additional details about this issue for me? Where in the file (at what offset) is the data that you are looking at and how are you interpreting that data by following the algorithm in section 2.4.6.4?

     

    Thank you.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Friday, December 26, 2014 9:12 PM
    Moderator
  • Hi Steven, can you provide some additional details about this issue for me? Where in the file (at what offset) is the data that you are looking at and how are you interpreting that data by following the algorithm in section 2.4.6.4?

     

    Thank you.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Hi Josh,

    I will do a write up soon and post it here.  I am mainly using the tool OffVis to pull the information out.

    Monday, December 29, 2014 1:28 PM
  • Hi Steven, have you had a chance to put that information together? OffVis is a great tool.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Tuesday, January 6, 2015 8:18 PM
    Moderator
  • Hi Josh,

    I have just started working on it.  I hope to have the information in the next few days.  OffVis doesn't show style information so that is slowing me down a little bit.

    Tuesday, January 6, 2015 9:35 PM
  • Ok, I have a write up but I'm afraid it might be confusing.  I'll answer any follow up questions/clarifications as needed.

    File: Justlist2.doc

    All offsets are from the OffVis application unless otherwise noted.

    Style information isn’t revealed by OffVis.  Every attempt is made to provide style information directly from the binary in the file.

    WordDocument stream begins at offset 0x200

    Style istd 0x0010

    SPRMs

    sprmPIlvl 0x0001

    sprmPIlfo 0x0001

    0xa413 0x003c

    0xa414 0x00f0

    0x242a 0x1

    sprmPOutLvl 0x0001

    Style istd 0x0011

    SPRMs

    sprmPIlvl 0x0002

    sprmPIlfo 0x0001

    0xA414 0x00F0

    0x242A 0x1

     

    First paragraph with list properties (offset 0x803):

    PAPXINFKP[1]

    Istd: 0x10

    SPRMs

    sprmPIlvl 0x01

    sprmPIlfo 0x0002

    Second paragraph with list properties:

    PAPXINFKP[2]

    Istd 0x11

    Third paragraph with list properties:

    PAPXINFKP[3]

    Istd: 0x11

    Fourth paragraph with list properties

    PAPXINFKP[2]

    Istd: 0x11

    Fifth paragraph with list properties:

    PAPXINFKP[2]

    ISTD: 0x11

     

    For paragraphs that have an istd of 0x11, it appears that they inherit their ilvls and ilfos from the style.  The Word documentation doesn’t make this clear as it states that the first step to determining the level number is to find the inline properties which doesn’t equate to finding style attributes.  If you follow just the direct formatting of the paragraph, it would appear that these paragraphs are not part of a list, but Word certainly shows them as part of a list.  Do I have this correct or am I missing something important?

    Sixth paragraph in list.  This is where I get confused:

    PAPXINFKP[4]

    ISTD: 0x11

    sprmPIlvl 0x0002

    sprmPIlfo 0x0002

     

    iLfoCur is 2

    iLvlCur is 2

    lfolvl and lvl are structures.  LFO[1].rgLfoData[1].rgLfoLvl[2] in PLFLFOInTableStream and LVL[1] in the TrailingLVLs of the PLFLSTInTableStream.

    nfcCur is lvl.lvlf.nfc which is 0

    lfolvl exists and lfolvl.fStartAt is nonzero.  iStartAt is lfolvl.iStartAt which is 1

    lvl.lvlf.fNoRestart is 0, so iLvlRestartLim is iLvlCur which is 2

    numCur is iStartAt = 1

    The only paragraph that has an iLfo property equal to iLfoCur is the third paragraph in the document, the first paragraph with list properties with an istd of 0x10.  The iLvl property of that paragraph is 1, which means it’s less than iLvlRestartLim, which means that numCur is iStartAt which is 1.  I think this would be (a) after formatting is applied, but Word is showing (e).

    Thursday, January 8, 2015 8:08 PM
  • Hi Steven, thanks for the information. I will review that and let you know when I have more information about this issue.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Friday, January 9, 2015 5:53 PM
    Moderator
  • Hi Steven,

    I've found that the problem you're seeing is likely a known issue in Word 2007 and probably hasn't been fixed. It is an apparent logic bug that causes an LFO set (lfolvl array) to be ignored when numbering the list.

    I have experimented with this some both in OOXML (.docx with compatibility preserved) and binary (.doc). They both show the same behaviour with your document. Ultimately, I will have to file a new problem report to get this addressed in the product but, the specification is correct. Word is just getting confused with this specific sample document's formatting.

    To work around this, I've found that you can actually force the restart on paragraphs 6 and 7 through the UI. When you do this, although the properties of the paragraph look similar, the lfo sprmPIlfo will change, pointing to a newly created LFOLVL array. This will use the same formatting in other respects but Word will also pick up the restart directive when numbering.

    This is easier to see when working with the Office Open XML format. Save the document as a .docx. You can check the box to preserve compatibility or not, it shouldn't matter. Then you notice the w:startOverride's in the corresponding numId 2 numbering section's w:lvlOverride block. When you force the restart via the UI, you can see that new numId sections are added and numId 2 isn't used.

    So, to summarize, I believe your binary parsing to be correct and that programmatically following the specification with respect to numbering this list would lead to the correct restarts in your own implementation. The fact that Word doesn't show it is most likely due to a logic error.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, January 15, 2015 12:19 AM
    Moderator
  • Hi Josh,

    I have one more follow up question that wasn't addressed by your fantastic answer.  In the sample file I sent to you several of the paragraphs do not have ilfo or ilvl properties set inline on them.  The Word specification states to follow the steps for direct formatting of a paragraph.  The direct formatting algorithm does not apply style information to the paragraph.  Should the style information be used to determine the ilvl and ilfo of the paragraph?  Without style information, it would appear that those paragraphs without inline ilvl and ilfo properties would be outside of the list, and Word does show them in the list.

    Thursday, January 15, 2015 1:48 PM
  • Hi Steven, I believe the answer to your question is covered in section 2.4.6.6 part 2 step 1, where it tells you to determine both the direct formatting for the paragraph and also the prl’s for the styles applied to that paragraph. These will both be considered in the subsequent steps. So the answer is YES, the style (in this case “MT3”) is applied and this style has the level numbering needed to keep paragraphs 6 and 7 in the list.

     

    The problem I see is that section 2.4.6.3 part 1 step 1 tells us to go to section 2.4.6.1 "Direct Paragraph Formatting", which is incorrect. It should be referring to section 2.4.6.6 "Determining Formatting Properties". I have filed a request to have that corrected.

     

    Please let me know if this answers your question.



    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Friday, January 16, 2015 9:22 PM
    Moderator
  • Hi Josh,

    That's exactly what I wanted to know!  Thanks for clarifying for me.  I will mark your first answer as the answer I was looking for.  Thanks again.


    Steve

    Friday, January 16, 2015 9:44 PM