locked
How to convert .docx file to html file with formatting using open xml sdk 2.5 RRS feed

  • Question

  • HI,

    I am trying to convert the document file(.docx) to html file. Currently I am able to do the conversion but html file does not retain the formatting.  I am using open xml sdk 2.0.

    For example: If a paragraph contain the text in red color with some text with bold and underline in docx file, the converted html shows all the lines as simple text and lost all the formatting. 

    Here is my current code :

            public string ConvertDocxToHtml(string docxFileEncodedData)
            {
                string inputFileName = DateTime.Now.ToString("ddMMyyyyhhmmss") + ".docx";
                string imageDirectoryName = inputFileName.Split('.')[0] + "_files";
    
                DirectoryInfo imgDirInfo = new DirectoryInfo(HttpContext.Current.Server.MapPath("~/Documents/" + imageDirectoryName));
    
                int imageCounter = 0;
                byte[] byteArray = Convert.FromBase64String(docxFileEncodedData);//File.ReadAllBytes(docxFile);
                using (MemoryStream memoryStream = new MemoryStream())
                {
                    memoryStream.Write(byteArray, 0, byteArray.Length);
                    using (WordprocessingDocument doc =
                        WordprocessingDocument.Open(memoryStream, true))
                    {
                        HtmlConverterSettings settings = new HtmlConverterSettings()
                        {
                            PageTitle = inputFileName,
                            ConvertFormatting = false,
                        };
                        XElement html = HtmlConverter.ConvertToHtml(doc, settings,
                            imageInfo =>
                            {
                                DirectoryInfo localDirInfo = imgDirInfo;
                                if (!localDirInfo.Exists)
                                    localDirInfo.Create();
                                ++imageCounter;
                                string extension = imageInfo.ContentType.Split('/')[1].ToLower();
                                ImageFormat imageFormat = null;
                                if (extension == "png")
                                {
                                    // Convert the .png file to a .jpeg file.
                                    extension = "jpeg";
                                    imageFormat = ImageFormat.Jpeg;
                                }
                                else if (extension == "bmp")
                                    imageFormat = ImageFormat.Bmp;
                                else if (extension == "jpeg")
                                    imageFormat = ImageFormat.Jpeg;
                                else if (extension == "tiff")
                                    imageFormat = ImageFormat.Tiff;
    
                                // If the image format is not one that you expect, ignore it,
                                // and do not return markup for the link.
                                if (imageFormat == null)
                                    return null;
    
                                string imageFileName = "image" + imageCounter.ToString() + "." + extension;
                                try
                                {
                                    imageInfo.Bitmap.Save(imgDirInfo.FullName + "/" + imageFileName, imageFormat);
                                }
                                catch (System.Runtime.InteropServices.ExternalException)
                                {
                                    return null;
                                }
                                XElement img = new XElement(Xhtml.img,
                                    new XAttribute(NoNamespace.src, imageDirectoryName + "/" + imageFileName),
                                    imageInfo.ImgStyleAttribute,
                                    imageInfo.AltText != null ?
                                        new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
                                return img;
                            });
    
                        string htmlFilePath = HttpContext.Current.Server.MapPath("~/Documents/" + inputFileName.Split('.')[0] + ".html");
                        File.WriteAllText(htmlFilePath, html.ToStringNewLineOnAttributes());
    
                        return ConfigurationManager.AppSettings["ServerUri"].ToString() + "/Documents/" + inputFileName.Split('.')[0] + ".html";
                    }
                }
    
            }

    So I just want to know how can I retain the format of docx in html file ?

    Thanks

    Thursday, December 19, 2013 10:42 AM

Answers

  • The HtmlConverter does not retain formatting of the document as far as I know. You can use css- stylesheets for the formatting, but you have to define them yourself.

    More information about how to do that you can find here.

    Thursday, December 19, 2013 12:46 PM

All replies