none
CodePlex Agility Html DOM and MS Word Document Exported as Html RRS feed

  • Question

  • Hi,

    Sorry about the length of the post but I need to describe my issue and I also am including a sample of the Html that Word generates at the bottom of the post:

    I have an MS Word document that is being exported as Html so that we can recover inlineshapes and other images (eg: Visio diagrams) that have been pasted into the document.  I recently discovered the CodePlex Agility library and I'm trying to use Agility to help me parse & modify the exported Word.Html document file.

    The task in C# code is:

    1. Open the exported .Html document
    2. find the 'src' tags for the image files that got created during SaveAs Html
    3. Add the image files to a TFS WorkItem as an attachment(s)
    4. Update the 'src' tags in the exported document to point to the TFS server attachments instead of the local C: drive exported image files.

    Problem: I am able to open the Html document and run an XPath query looking for "//@src" - this attribute query finds about 1/2 of the tags that I need to update.  However the query is unable to find any of the 'src' attributes associated with <v:imagedata src="..." tags.

    I tried to run XPath queries looking for "//@v:src" and "//v:imagedata" but they both return the error  "Namespace Manager or XsltContext needed.  This query has a prefix, variable, or user-defined function".

    Questions: What kind of XPath query might be able to work with the Html that Word is exporting?  Is there some sort of export option that might change the style of Html that word is generating such that I can work with it?

    Sample Word Output partially updated by my code: The snippet below shows an image that got exported during SaveAs Html.  You can see that the  <img src="..." tag gets updated to point to the TFS server link however anything associated with <v:imagedata src="..." is being ignored:

    <p><span style='<!--[if gte vml 1]><v:shapetype
     id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t"
     path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
     <v:stroke joinstyle="miter"/>
     <v:formulas>
      <v:f eqn="if lineDrawn pixelLineWidth 0"/>
      <v:f eqn="sum @0 1 0"/>
      <v:f eqn="sum 0 0 @1"/>
      <v:f eqn="prod @2 1 2"/>
      <v:f eqn="prod @3 21600 pixelWidth"/>
      <v:f eqn="prod @3 21600 pixelHeight"/>
      <v:f eqn="sum @0 0 1"/>
      <v:f eqn="prod @6 1 2"/>
      <v:f eqn="prod @7 21600 pixelWidth"/>
      <v:f eqn="sum @8 21600 0"/>
      <v:f eqn="prod @7 21600 pixelHeight"/>
      <v:f eqn="sum @10 21600 0"/>
     </v:formulas>
     <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
     <o:lock v:ext="edit" aspectratio="t"/>
    </v:shapetype><v:shape id="Picture_x0020_1" o:spid="_x0000_i1025" type="#_x0000_t75"
     style='width:468pt;height:292.5pt;
     <v:imagedata src="Doc1_files/image002.png" o:title="Pic"/>
    </v:shape><![endif]--><![if !vml]><img width="624" height="390" src="http://wpca0020mtfs01:8080/tfs/defaultcollection/WorkItemTracking/v1.0/AttachFileHandler.ashx?FileID=24960&FileName=image004.gif" v:shapes="Picture_x0020_1"><![endif]></span></p>

    Anybody got any ideas?  Any help would be greatly appreciated!

    --Richard


    Thursday, May 31, 2012 5:14 PM

All replies