locked
Multiple styles in paragraph. RRS feed

  • Question

  • Hi,

    We're trying to create style-based parser for a Word document content. Currently we've stuck on problem with multiple styles in paragraph. Whole concept is based on iterating on paragraph level, because going down to words will be to slow. But the problem is that there is no information that multiple styles have been used. For example if i'll check the paragraph with just a part of content in the middle styled with other style that it begining and end, then i'm only getting information that the style used on the end is used, no other information that i need to go deeper (like with bold etc.). Anyone know solution for that? Or maybe there is better way to iterate thru document? Maybe i can use some kind of array with styles where i can iterate like with paragraphs? Or maybe i should skip style-based approach and there are some other containers where i can put range? 

    Monday, February 24, 2014 9:57 AM

Answers

  • << i have to support Office 2003 and XP>>

    <SIGH>

    <<ps. Is there a way to make "live" changes on OpenXML that will affect displayed document?>>

    The Range.WordOpenXML property will retrieve it; the Range.InsertXML can write it back, as long as it's in valid Word Open XML flat file format. But that won't fly with Word 2003 one-on-one. Word 2003 does support InsertXML and uses the Range.XML property to retrieve its version of WordProcessingML (and this will also work in 2007, 2010, etc.)

    So theoretically, you could go this route, although the 2003 WordProcessingML could get tricky, the further away you get from that version. And it will NOT support new technologies (content controls, for example) that could be present in newer versions. But there's no law that says you couldn't check the application version and branch your code.

    <<For example... user click "start" type text, and click "stop", do that couple times and that will result in collection of that ranges, which i can restore after i open the Word document. Is there any container that i can use?>>

    If you have to support the old version then the only thing I can suggest would be BOOKMARKS. Set a bookmark at the Start, locate it again at Stop and create it around the entire range between Start and Stop. If you include an underscore at the start of the bookmark name it won't show in the document or the Bookmarks dialog box unless the user requests it.


    Cindy Meister, VSTO/Word MVP, my blog

    • Marked as answer by Ddawid Monday, February 24, 2014 3:13 PM
    Monday, February 24, 2014 2:08 PM
  • << is it possible to get the paragraph content with ListNumber as text, without converting it?>>

    Not as far as I know, no. You need to read the List properties, as you suggest, in order to get that information.


    Cindy Meister, VSTO/Word MVP, my blog

    • Marked as answer by George Hua Monday, March 3, 2014 10:38 AM
    Monday, February 24, 2014 3:00 PM

All replies

  • Hi Ddawid

    Basically, you'd have to walk the document by character in order to be sure you pick up all style formatting. And yes, that would be slow.

    <<Or maybe i should skip style-based approach and there are some other containers where i can put range?>>

    Very diffcult to make any suggestions without knowing exactly what you want to do? (The "Why" behind your question.)

    I'm thinking most efficient will be to work directly with the underlying Word Open XML...


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, February 24, 2014 1:38 PM
  • For the document i'm working with iterating thru paragraphs taking 6 seconds, i event don't want to know how much time it would take to do this by the characters.

    <<Very diffcult to make any suggestions without knowing exactly what you want to do? (The "Why" behind your question.)>>

    I want to create some kind of collection in which I would put range object, to which i could later on reefer too.

    For example... user click "start" type text, and click "stop", do that couple times and that will result in collection of that ranges, which i can restore after i open the Word document. Is there any container that i can use?

    <<I'm thinking most efficient will be to work directly with the underlying Word Open XML...>>

    Agreed, that would be great, but unfortunately i have to support Office 2003 and XP. 

    So far, thanks for your tips.

    ps. Is there a way to make "live" changes on OpenXML that will affect displayed document? Or i have to use OpenXML library, get property from Range etc. ?

    • Edited by Ddawid Monday, February 24, 2014 1:54 PM
    Monday, February 24, 2014 1:45 PM
  • << i have to support Office 2003 and XP>>

    <SIGH>

    <<ps. Is there a way to make "live" changes on OpenXML that will affect displayed document?>>

    The Range.WordOpenXML property will retrieve it; the Range.InsertXML can write it back, as long as it's in valid Word Open XML flat file format. But that won't fly with Word 2003 one-on-one. Word 2003 does support InsertXML and uses the Range.XML property to retrieve its version of WordProcessingML (and this will also work in 2007, 2010, etc.)

    So theoretically, you could go this route, although the 2003 WordProcessingML could get tricky, the further away you get from that version. And it will NOT support new technologies (content controls, for example) that could be present in newer versions. But there's no law that says you couldn't check the application version and branch your code.

    <<For example... user click "start" type text, and click "stop", do that couple times and that will result in collection of that ranges, which i can restore after i open the Word document. Is there any container that i can use?>>

    If you have to support the old version then the only thing I can suggest would be BOOKMARKS. Set a bookmark at the Start, locate it again at Stop and create it around the entire range between Start and Stop. If you include an underscore at the start of the bookmark name it won't show in the document or the Bookmarks dialog box unless the user requests it.


    Cindy Meister, VSTO/Word MVP, my blog

    • Marked as answer by Ddawid Monday, February 24, 2014 3:13 PM
    Monday, February 24, 2014 2:08 PM
  • <<I can suggest would be BOOKMARKS. >>

    Thank you! I was looking exactly for something like that, it even have wdSortByLocation, so it's perfect.

    Just one last question ... is it possible to get the paragraph content with ListNumber as text, without converting it? I want to use regex on content, and don't convert it to user. Or i have to read ListString and ListFormat and create it by myself?

    Monday, February 24, 2014 2:51 PM
  • << is it possible to get the paragraph content with ListNumber as text, without converting it?>>

    Not as far as I know, no. You need to read the List properties, as you suggest, in order to get that information.


    Cindy Meister, VSTO/Word MVP, my blog

    • Marked as answer by George Hua Monday, March 3, 2014 10:38 AM
    Monday, February 24, 2014 3:00 PM