none
Why openxml unable to load document if it has invalid hyperlink ? RRS feed

  • Question

  • Dear team

    my document has hyperlinks in it, some are valid and some are invalid.

    when i trying to load that document in openxml, it throws an exception.

    Please suggest, how to resolve this issue

    Thanks in advanced

    Prasad

    Friday, July 3, 2020 12:34 PM

Answers

  • Resolved !!!

    i got following code and resolution from Eric site, it works well.

    Following is the complete listing of the class UriFixer, as well as the code to use it.  The approach that you take when using this class is to first attempt to open the document as usual, catching OpenXmlPackageException.  If that exception is thrown, and if the text of that exception contains “Invalid Hyperlink”, then the code calls UriFixer.FixInvalidUri.  After calling FixInvalidUri, the code then opens the fixed document (or spreadsheet / presentation) as usual.

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.IO.Compression;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Xml;
    using System.Xml.Linq;
    using DocumentFormat.OpenXml.Packaging;
    
    class Program
    {
        static void Main(string[] args)
        {
            var fileName = @"..\..\..\Test.docx";
            var newFileName = @"..\..\..\Fixed.docx";
            var newFileInfo = new FileInfo(newFileName);
    
            if (newFileInfo.Exists)
                newFileInfo.Delete();
    
            File.Copy(fileName, newFileName);
    
            WordprocessingDocument wDoc;
            try
            {
                using (wDoc = WordprocessingDocument.Open(newFileName, true))
                {
                    ProcessDocument(wDoc);
                }
            }
            catch (OpenXmlPackageException e)
            {
                if (e.ToString().Contains("Invalid Hyperlink"))
                {
                    using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
                    {
                        UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
                    }
                    using (wDoc = WordprocessingDocument.Open(newFileName, true))
                    {
                        ProcessDocument(wDoc);
                    }
                }
            }
        }
    
        private static Uri FixUri(string brokenUri)
        {
            return new Uri("http://broken-link/");
        }
    
        private static void ProcessDocument(WordprocessingDocument wDoc)
        {
            var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
            Console.WriteLine(elementCount);
        }
    }
    
    public static class UriFixer
    {
        public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
        {
            XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
            using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
            {
                foreach (var entry in za.Entries.ToList())
                {
                    if (!entry.Name.EndsWith(".rels"))
                        continue;
                    bool replaceEntry = false;
                    XDocument entryXDoc = null;
                    using (var entryStream = entry.Open())
                    {
                        try
                        {
                            entryXDoc = XDocument.Load(entryStream);
                            if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
                            {
                                var urisToCheck = entryXDoc
                                    .Descendants(relNs + "Relationship")
                                    .Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
                                foreach (var rel in urisToCheck)
                                {
                                    var target = (string)rel.Attribute("Target");
                                    if (target != null)
                                    {
                                        try
                                        {
                                            Uri uri = new Uri(target);
                                        }
                                        catch (UriFormatException)
                                        {
                                            Uri newUri = invalidUriHandler(target);
                                            rel.Attribute("Target").Value = newUri.ToString();
                                            replaceEntry = true;
                                        }
                                    }
                                }
                            }
                        }
                        catch (XmlException)
                        {
                            continue;
                        }
                    }
                    if (replaceEntry)
                    {
                        var fullName = entry.FullName;
                        entry.Delete();
                        var newEntry = za.CreateEntry(fullName);
                        using (StreamWriter writer = new StreamWriter(newEntry.Open()))
                        using (XmlWriter xmlWriter = XmlWriter.Create(writer))
                        {
                            entryXDoc.WriteTo(xmlWriter);
                        }
                    }
                }
            }
        }
    }
    

    We are considering including this method in the Open XML SDK itself.  We would make a few overloads of the WordprocessingDocument.Open method, the SpreadsheetDocument.Open method, and the PresentationDocument.Open method.  These overloads would take the callback as an argument, just as in the above example.  These new methods would first attempt to open the document in the normal way.  If the attempt to open is successful, then these methods would return the newly opened document.  However, if System.IO.Packaging throws the OpenXmlPackageException, and if the document were opened for writing, then the method would open, modify, and save a fixed document.  It would then attempt to open again, and return the newly opened document.

    With this approach, the idiom to open the document would be almost identical to the current approach to opening a document.  The only difference would be the inclusion of the callback method as an argument.

    If the document was opened for read-only access, then the various methods would create a copy of the document in memory, fix the broken Uri objects, and then open and return the fixed document (for read-only access).

    Please feel free to comment about how this approach would work for you.  If we have agreement on this approach, then in a month or two, we will make the change to the open source version of the Open XML SDK.

    • Marked as answer by koolprasadd Thursday, July 30, 2020 6:56 AM
    Thursday, July 30, 2020 6:54 AM