none
How to 'parse' raw HTML text to Word format using OpenXML? RRS feed

  • Question

  • Hello,

    Fairly new to using OpenXML. I have a site which uses a rich-text editor (ckeditor IIRC), which outputs text into it as raw HTML data. Sample is below of a simple text with an inline-image:

    <p>PHD Abstract</p>
    
    <p><strong>Title</strong></p>
    
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
    </ul>
    
    <p><img alt="" border="0" hspace="0" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxASEBAUDxQWDxQQFRAQFRAPEA8QFBAPFBQWFhQXFBQYHCggGBolHBQYITEhJSkrLi4uGB8zODMsNygtMCsBCgoKDg0OGhAQGCwcHB8sLCwsLCwsLCssLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCs3LCsrK//AABEIAK4BIgMBIgACEQEDEQH/xAAbAAACAwEBAQAAAAAAAAAAAAAAAQIDBAUGB//EADoQAAEDAwIDBgMHAwMFAAAAAAEAAhEDITEEEkFRYQUTInGBkQYyoRRCscHR4fBSYvEjcoIHFTOS0v/EABoBAQADAQEBAAAAAAAAAAAAAAABAgMEBgX/xAAiEQEBAAICAgMAAwEAAAAAAAAAAQIREiEDQQQxURMigXH/2gAMAwEAAhEDEQA/APdSnKgmulzpSiVGUSgnKJUAU5QSlEqMolBKUSkkgnKJUEIJyiVCUIJylKihBKUpSQgcolRJTaCSABnHVBIewAJJJgADJKjPr15rn/EdcgN09K9SoWh5HBs48ufktwbAA5AD2Cphnyt19RfLDjJv7qUo3KKS0USBRuUUSmhKUSoITQnuRuUJSTQnuRuUEJoS3JblFCaEpRuUUk0J7kKCE0LE04QAoCTThEIEhOEQgSaIRCAQhOECQiE4QJCIQgEiU0QgvOhq2hszcFpBB9km6GqTGwjqbAeZKoc6qB/pvLOkS32/RPQVq5Lu/ja3xbqbrPIPyuBEgLHPPPH01xwxy9urT01KmCXf6rm5sS0HkGjK5es+IWmk40qZG0lhJZssILtk3I8uquq9oOm1h7DzXnu2vizS0b1XQHHYLAlz4sL8Fz8rl91txmP0Xw9SNSpWrOO6CAJ6i8LtFc/RvDCdsDDosutToOeNzBI5AiR6LfxZSTiy8ktu1EIhSI58ELoYoQiFNJBGEQppIIkJQppIIwiFJJBGEQmhSgoShNCBQhNCC5EIQqpNJNCBJoQgSE0IBCEIEmmkUCQpJIFCEyhAKjWioWHuo3WiTAI4hXoUWbmqmXV3GR9X/Tg5iD58V5jWdiMrOpuqNaW0yXAEffMXvbmvRdtAhm5oxlcU9o03SzcHFrWu2A3kmG46hcWtWx1b3Ntbq7Q4j+39V1extaRacgLx+t1IdWLGcNrSQcHJH5e67fZdUA24HbwtZRvseu3MdG4TPHBHqpHs+kflcRywVy++8Nr/AKKek1Zxy/BazO/qvGX021eyo+V8+YhY6une35hH1W1ur6rSdSC2bH9VM8tn2rfHHFBQr9SwZbboqF0Y5TKbjGzVCipJKyCSTQgihOEIEhNCBITSQXoSTUJCEIRAQhCAQhCBpIQgE0k0AhCCgSE0IEkpJIIaigypSqMeJa9pBjh1Xzns74Zdp67j3xqMLt4JaQ47YhpOIX06g0E7TbdafqvKdsmmHuNKX93BcDl2flXB8m3n06/DjLj24upIpk7BEyZuS6cnzVvZ+pIHPleAPOxtbKNRq6VRjSDIN7n97rdpOw2PA76r3MgEU27Q6DiXESJ5LLG1exs0Wt3QDjE5n24LdqXBm0j5Tx5FUjQUdO0CnfJkkuJ85WXtTUTStwIPQ+ULb6+1G12tgG+P8LQzWnbOZix5+EH8V5ypqvyHqZChR7RBIAP3r3v4fF7yq2pj1mnrEi+bQeYIkfirSuJoNcIYCZsfoV2qTpFl0eHL0x8k9hCcIXQxJJShEIIoUoSQJCaECQmhBNNCSJNCEIBCEIBCEIg0IQgEIQgEIQgEIQgEIQUCfO0kXIBPnZeCGpZ3r7lrnGdp3A7QDEcCvoDF4b4l0JDnOb9xxIEYPmuP5OP9pXT4L1Y8vo9K6lXaysdgFVhLHEWpueDEf7SvYdq0g7VTtkNcD4jl2McsK7t/4Yp6lja1IeJ4bUsYkkTyytOv0Y3jg4w7Ikgc1hwuN7a45bji9tdry8tGWfeERJ4KrT1HbG77zecgFcvU0awLy6m4XJmDxJifOCs41DhY4GAPpKi5Xfajo6/VFjA4EZfI5gC0esLP2P8ACWrfTbXa7/yTUa1xu5p4zNv2XM1znPgcBgL6h8Galp0lJhO11JsFpyGzIj0Kv4pjlbyRluTp5PQamADxaSCDy4r13ZNYuHQ8Vx+2uyh9sHc+JtYCo4Whr5h2OefddnQUzTAa4RmMDC08f3tGU6dBCZSXa5whCEQSE0kAkmhEkhEIQTQkhA0IQgSaSaAQhCBoSTQCEIQCEIQCEIRAQhJEpBcLta5c08TOBf8Aey7gXjvjr4gGlcQ5sFzN1N7iQ10HxQBcxK5vkzclbeC913vh/UjZ3d/A8NuLBrrgA+hVOsqO3EmTfBgeHMfVc/4C1datoamoDQXVXuNKluIB7rw/Mf6nArzL/jndXbRbT3vc8UtgLpL3mNt8XPHkVnny/rv8aY2dvpWlaHU2n+oTPMkRPsAuX29X0VKmW1KbKryIDAACORLhcei7GmbtYwOgbWtmDIkC8FfPu2NSH1HvkEguMDgLxI52HutPNnx1j+s/HjvtztPo27twBaDwJJj3XouxwZMYLSzjzBJ+n1VHw32O7V0w+rNKn/ZY1D/aTgdVyf8AqF2w7RamnSoeDbSpOpsEmWkubf8AqO4On0WP8dmO9L85bp76pXoacN32L8Xkl0fss1Cq7UPaWSKTTd/BxHBvPkVwfhrsGvq6dHU9oPcC6XN0+zYdknaX3tI4Rgr2rGNaA1gDWiwa0QAPJb+Px32zzz/DKSE1uyCSaESSE0kCQhjpEj6gj6KmvqA2Opj8/wAk2LoQsP20c0KnOJ1W5NRTlaKmhCEDSQhEBNJCBoQkgaaSESEJoQCEIRASKaRCAWTtPsrTakNGppNrbN23eJ27m7XR6fgFrCFAdINa0NYA1rQA1rQGhoGAAMBY2dkaRtXvW6ek2rLn96KVMP3u+Z26Jk8/Na0KUlWZva5slu4Fu5sS2REjqvmHZPwLrXa/fX206FGoHPfuk6sB26GtHB3GcSQvqCcqlwlu6mZWTRkgWAAHICAoFjNwcWtLgIDi0FwbMwDmJKYIPny4+yhVMFnUx7g/z0VhY5ySHIJjKICSA4cDPkluCbSZTVVSsFT9rCjlIaa1VqDA3AwQHQDgxe49D9VlfrgFj1OtyJmZImPA4Yg8v1Vb5JE6dStXESMET6Lgdpa+OKy1+2WwQCLcAfcfzmuDW1BLiQZGed+C4fkfJkmo0xx/Wx3arp+V3sP1Quf33SesD9ELh/myX3H06UbgsR1CidQvv7c+m/emHhc77Qj7QmzTo70b1zTqVA6pDTqd4Ed6FyvtSR1SGnW71PvAuSNUpfaUNOp3ij3y5h1KR1CbNOn3yO+XKdqFH7SmzTsCsg1VyhqEjqUNOqK6O+C5B1Cf2hNp06wrINZcr7Sou1ajZp1TWS79cg6tRdq1FyieLs/aRa+cdbSpCuvPnV4I4G48wRbrdN/aQABBkflceirfJjPZxd51QH9eRWTUa0bPEQIcwEzw3NuPQgrgs7V+Y4kuIExIBn8FzNV2tBcANwvAOIc4PHsR6LHL5WMTwe2frYhrj4g5oMYIuQ4dDHvIVdXtKHEHAnHCOi8U7XVPBBuJgHMYA3HnwUG697nZubR52WV+Z+LfxvXP7UbeCM5njHHpYrMe18mcxAtaMryR1D59ZA5+aYcZ+aY5eQj3kLG/Kyq0wj01XtMcDlYB2ibwcku91x+/BA4yZPA3OB6JMc7IxObYNuCyy82V9p1HUPaJm/XKxV9Wdwzj81URMzPuPWZVNZ4F8z+An0/yssvJkmmXEnibGTdAokAiRkCNwmOapiZybcJx1Q/cL3G2JdH3Z4rL/qEIqC0YtkoVg1L+DrcM4Qo0jp7d1UqHelVFyjK9JIzXh5T3qkFBKaE3PKrLkiUQpkQYKcqKEFgKkHKkOUpTQsLkAqsuTa5RpKZSJS3KO9NCwFNZ96DVVblILy9VOqql9RZ3VFjn58YtMWk1lHvVm3JFy5c/lfi/Fp71RqVoEz/hZS9ZKlcXBPliQRcgrmvnyq2k9Zqi0iMnziJ58uKzfbD9T9f0JWapqfmbwdEYFshQ2kFs8RuF5vH+Pbqsr5LUNBqgAg4JkE4m8x6ge6r72YmxicQD5hQ1M2Bkg9Zg/wBvqVZRoZ3OwN20SXTBgRy6/oq7T3eknEbTA5RHAz+yp7yXWMAW6ib/AM81botMXGXmGCSYIFptf+RCvr6NgvTdui3dwJjo770LTHHKzcNVmbWGYB4/8uX0CQqudERaZA/pifdVUddQbuaQ6WSJcC0ggXLgcZVlDVCW+DiHeEkBpGJMC/oFHaeJ1a0ExgZsAMfr+CVN523kNJzBIJEH1/dMtpuE7SOgdHHh0srqlZophsBjRN/nLxkAmLXvbkmvdRqys7qu0ARYcBxcZt5DCW9vHnJm8/y6GO3NMZjAgCAL+WfooinBBPM4MzxN/Q/VVVX1HkgCwiZLeJPTis+pY4iGu4jMSOl/NSFQExgAxJPkeXSU++kSROI4E9eajXtXtWKIFi0yLG5ymp7X/wBQ9z+iE7ONevTThEL0ahSkSmQkEQXFOVFwQpEkITlRboEJQomohtVZ3y4xOkkw5VOeqqlTks8vkYxbivdUVLqqpdUUFx+T5d9LzFa+ool6rUwue+bLJaYkXqB6qwC4VerN8xY9Cs7tOhTddD6gBvx/JUNc5wlnr/PVZNRVgXuZ+UGCD1HDHBN6F+o1GQeZ4xziOq55dJm15Mu/NKtqS5w4TAM5J6e4RUa7xXECRtFr9Qq77Qi0s2fLecybWjj1Vr4ADj4SWggT6W5fuqmgOBgceHTh+KupUgwtc8jaZgEAuMX/AOKraSK6YL+UZhwxfgf5haG0wwkwXuOeciDafKVraAB/rOLQ+7GABrg3MHljksPaVZrB4Bu3ONt4BZIlombiwzxKmS3ptMNe2etUc/bzEkQB96wgKDajmOZxi08ua0s1M5lhkmcE3tI4DlKGVWbSH4bJ3QZnBI87LbHLXSK0MeHNIqBpDiRfIAAgT5nC5jaZoVhO0sdPje6xnA5B2OS2V3xSlj5IktBBJgDJHn+Kpq0u/Y0jJBmfEHgOtn6ea2lmSuu1Neqd5NiDB8A+pbE8fqoP3FzcERYEgYvYcMq2hpIDW+KnEwzaG+Hh1yri3bIPiz4yARw48JErDLqosqvTnMANiMixn8VZX1QgWIdB2uvicW9oVbBMyC0tjOD19iovqgOALd3CZMNAtM+6zV3pNgkD7xsZNlS9pEElsNyXXkxckccqb9UDjETHAz5cLKs6loDTMOmQG3tfM/gVM36US+30uJZ7AIUBqf8Ad/6s/wDpNX/xPK/r3W5LckVElffZLJSVBqKIqLLPyzFMm2h7lAVAstSoZ87eqgXHP8K48/maX4NveBUVK2VQ6oc9JScubyfKyv0tMUzUUt6pCdc2C5+eV7tTIvomQVU5GnqQCOZTr5Cte4nXSDjZTYLSqXZKsa+Bzx9VMxNA5Vf2kC1r4KNQYiFhqgQDcHNsc0vR9/S+pqZEjPIZF/3VWrrSASbExxmbYGZngqKNVsWGHGDafXic81TVqQDxj8Qs+abEqlZzS5tEnxESSQ044AwOapoB0u7x2+TaYGefVRY4HA2mxkHjCbagmAImRzGE5dKW36aWtbYgb4i+TaM9f5dOjTaS6o+xiQ1p27nHFxiI5LHp9Q4FsWDgfcHjPkrKeou7dJsDY8z1TtO20OJkAQRuHicAIi8k9AokC/S8fNbF/VZ9Jrt2+0yDMk8Qp1qhLXR81rzmTBB/nBTrS0rRqajCQC37+4cfEOcXgXMLBVYzcXVHF5YDYfKL2PpN7XHkmyvAaXiZAiDg85VrKLS2GgA2eXcSSceSXcW5biupVMi20QcbXtIjcT7D6Qojd8r7mC7adoAZEmeXD2Vwe4fKYm3qf2gSp0m73OmHQMuAnxSLfmpit7Z2AlhAAfPI7S0CIAnPESrNDQf4gIlplvisHDAA4ym522bAukNdwDg75fUEZUqeumQABHQCRm6mZ6q0sW6jYXbnmH23Q3wg/wBtrHqqqVHcJu1pue82kRaOXms51rw7nFyCZGPJZv8AuFTxBsAWF28vywou6rydGpTLvmbaQYDYw3AibXWd1MCQ1sSB4SSL3WLU6irudLsQDmJjkqaXaJ2h5EknZfgenunHJnc5tt1dBoIB8NgLZOZAOeKzmg4knhixaSB7qRq7oJnA4m3ks2tc5gaWHNnSXSZE8/RWx3elcjbonQPE33H6oVQc7kPc/ohW/sjp/9k=" style="border:0px solid black; height:174px; margin-bottom:0px; margin-left:0px; margin-right:0px; margin-top:0px; width:290px" vspace="0" /></p>


    I store the data above in a DB for various purposes, mainly 1.) loading it back to display and 2.) generating some word documents. I currently have a library grabbed from the internet for HTML to OpenXML called HTML2OpenXML.

    The problem is, the current library I am using cannot parse the stored data completely (simplest example is the image is not displayed, as well as bulleted items).

    Is there a recommended way of parsing HTML back to Open XML? ckeditor can already do the reverse (you copy-paste data from Word to the ckeditor text-area on the browser).

    Thank you!





    • Edited by OCS.New Sunday, August 23, 2015 11:03 AM
    Sunday, August 23, 2015 11:00 AM

Answers

All replies