none
Chunk misunderstand accents RRS feed

  • Question

  • hello,

    I created a docx with openxml APIs. This includes chunks containing hmtl code (simple code with accents like "<html><head></head><boy><div>je vis à Yverdon</div></body>").

    When I edit this docx in word, first chunk is fine, second chunk scrambles the à to another strange special character.

    I rename the docx to zip, dirll down, copy afchunk2.xhtml to another directoy, copy back to zip, rename to docx, edit. Still wrong

    I rename the docx to zip, dirll down, copy afchunk2.xhtml to another directoy, edit it with note pad, DNO NOT change anything but save it (so, change timetstamp), copy back to zip, rename to docx, edit. It is now correct !

    What the hell in all that ?

    Thanks

    Thursday, June 5, 2014 6:52 AM

Answers

  • hi,

    After I investigated deeply, this issue seem to relate to the type of encoding. After I set the utf-8 format for the XHTML the issue was resolved. Here is the code:

      using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
      using (StreamWriter stringStream = new StreamWriter(chunkStream,Encoding.UTF8))

    Best regards

    Fei


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Monday, June 9, 2014 8:42 AM
    Moderator

All replies

  • Hi,

    Thank you for posting in the MSDN Forum.

    I haved reproduce this issue. And here is my code:

     class Chunk
        {
            public void Main()
            {
                using (WordprocessingDocument myDoc =
             WordprocessingDocument.Open(@"C:\Users\User1\Desktop\Test.docx", true))
                {
                    string html =
                      @"<html><head></head><body><div>je vis à Yverdon</div></body>";
                    string altChunkId = "AltChunkId1";
                    MainDocumentPart mainPart = myDoc.MainDocumentPart;
                    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                        AlternativeFormatImportPartType.Xhtml, altChunkId);
                    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
                    using (StreamWriter stringStream = new StreamWriter(chunkStream))
                        stringStream.Write(html);
                    AltChunk altChunk = new AltChunk();
                    altChunk.Id = altChunkId;
                    mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
                    mainPart.Document.Save();
                }
            }
            private static void SaveXDocument(WordprocessingDocument myDoc,
       XDocument mainDocumentXDoc)
            {
                // Serialize the XDocument back into the part
                using (Stream str = myDoc.MainDocumentPart.GetStream(
                    FileMode.Create, FileAccess.Write))
                using (XmlWriter xw = XmlWriter.Create(str))
                    mainDocumentXDoc.Save(xw);
            }
    
            private static XDocument GetXDocument(WordprocessingDocument myDoc)
            {
                // Load the main document part into an XDocument
                XDocument mainDocumentXDoc;
                using (Stream str = myDoc.MainDocumentPart.GetStream())
                using (XmlReader xr = XmlReader.Create(str))
                    mainDocumentXDoc = XDocument.Load(xr);
                return mainDocumentXDoc;
            }
    
        }

    I'm trying to involve some senior engineers into this issue and it will take some time. Your patience will be greatly appreciated.

    Sorry for any inconvenience and have a nice day!

    Best regards

    Fei


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, June 6, 2014 8:44 AM
    Moderator
  • It's probably a question of which character set (code page) was used for creating the file. Note pad may be writing an "invisible" (to you) marker for the character set that's different than whatever other editor was used that created the file. It's possible to specify which character set should be used by the software that "consumes" the file. You might try adding charset metadata to the HTML file. See:

    http://en.wikipedia.org/wiki/Character_encodings_in_HTML
    http://www.w3schools.com/html/html_charset.asp


    Cindy Meister, VSTO/Word MVP, my blog

    Friday, June 6, 2014 3:55 PM
    Moderator
  • hi,

    After I investigated deeply, this issue seem to relate to the type of encoding. After I set the utf-8 format for the XHTML the issue was resolved. Here is the code:

      using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
      using (StreamWriter stringStream = new StreamWriter(chunkStream,Encoding.UTF8))

    Best regards

    Fei


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Monday, June 9, 2014 8:42 AM
    Moderator