none
How to validate a Word document using OpenXML SDK 2.5? RRS feed

  • Question

  • I am trying to validate a 2013 Word document. The document is fully editable in Word 2013, "Check for Issues / Inspect Document" and "Check for Issues / Check Compatibility" do not report any problems.

    However, while opening the document using OpenXML, I am getting error on the line with WordprocessingDocument.Open. The error is:
    DocumentFormat.OpenXml.Packaging.OpenXmlPackageException: Invalid Hyperlink: Malformed URI is embedded as a hyperlink in the document.

    Every sample about validation I have found goes like this:

    using (WordprocessingDocument doc =
        WordprocessingDocument.Open(filepath, true))
        {                  
                OpenXmlValidator validator = new OpenXmlValidator();
    ...
    
    What am I doing wrong? Is there another way to do it? 
    Tuesday, June 25, 2013 9:06 AM

Answers

  • It seems OpenXML SDK uses .NET Uri internaly and it throws exception when url is malformed. Only workaround was to find all malformed links and remove them before opening doument with OpenXML. 

    You can use System.IO.Packaging.Package directly for this.

    public class PreOpenXmlProcessor {
    		private const string SlideXmlRelsRegexString = @"/ppt/(slides|slideLayouts|slideMasters)/_rels/(slide|slideLayout|slideMaster)\d+\.xml.rels";
    
    		public PreOpenXmlProcessor() {
    		}
    
    		public void Process(Package package) {
    			ProcessBasedRelationshipsUris(package);
    		}
    
    		private void ProcessBasedRelationshipsUris(Package package) {
    			bool needToFlushPackage = false;
    			foreach (PackagePart part in package.GetParts()) {
    				if (IsRelationshipsFile(part.Uri.OriginalString)) {
    					XmlDocument document = new XmlDocument();
    					bool needToSaveDocument = FindAndReplaceBadRelationshipUris(document, part);
    					needToFlushPackage = needToFlushPackage || needToSaveDocument;
    					if (needToSaveDocument) {
    						SaveXmlDocument(document, part);
    					}
    				}
    			}
    
    			if (needToFlushPackage) {
    				package.Flush();
    			}
    		}
    
    		private void SaveXmlDocument(XmlDocument document, PackagePart part) {
    			using (Stream stream = part.GetStream(FileMode.Create, FileAccess.Write)) {
    				document.Save(stream);
    			}
    		}
    
    		private bool FindAndReplaceBadRelationshipUris(XmlDocument document, PackagePart part) {
    			bool foundBadRelationships = false;
    			
    			using (Stream stream = part.GetStream(FileMode.Open, FileAccess.ReadWrite)) {
    				document.Load(stream);
    				XmlNodeList nodes = document.GetElementsByTagName("Relationship");
    
    				foreach (XmlNode node in nodes) {
    					if (!IsUriValid(node)) {
    						ReplaceWithDummyRelationshipUri(node);
    						foundBadRelationships = true;
    					}
    				}
    			}
    
    			return foundBadRelationships;
    		}
    
    		private bool IsRelationshipsFile(string filename) {
    			return Regex.IsMatch(filename, SlideXmlRelsRegexString);
    		}
    
    		private bool IsUriValid(XmlNode node) {
    			XmlAttribute targetAttr = node.Attributes["Target"];
    			Uri uri;
    			if (!Uri.TryCreate(targetAttr.Value, UriKind.RelativeOrAbsolute, out uri)) {
    				return false;
    			}
    
    			return true;
    		}
    
    		private void ReplaceWithDummyRelationshipUri(XmlNode node) {
    			XmlAttribute targetAttr = node.Attributes["Target"];
    			targetAttr.Value = "about:blank";
    		}
    	}

    • Marked as answer by Michal Mracka Tuesday, July 2, 2013 8:32 AM
    Monday, July 1, 2013 2:35 PM

All replies

  • Hi Michal,

    Thank you for posting in the MSDN Forum.

    I'm trying to involve some senior engineers into this issue and it will take some time. Your patience will be greatly appreciated.

    Sorry for any inconvenience and have a nice day!

    Best regards,


    Quist Zhang [MSFT]
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, June 26, 2013 11:27 AM
    Moderator
  • I have found a way to reproduce this. It is surprisingly easy:

    1. in Word, create a new document
    2. insert a new hyperlink - hit Ctrl+K and set the address to something with a parameter in msbuild style, for example: http://$(servername)/MyApp
    3. save the document
    4. try to validate it with OpenXML SDK 2.5, you will get the error mentioned earlier

    Wednesday, June 26, 2013 1:27 PM
  • It seems OpenXML SDK uses .NET Uri internaly and it throws exception when url is malformed. Only workaround was to find all malformed links and remove them before opening doument with OpenXML. 

    You can use System.IO.Packaging.Package directly for this.

    public class PreOpenXmlProcessor {
    		private const string SlideXmlRelsRegexString = @"/ppt/(slides|slideLayouts|slideMasters)/_rels/(slide|slideLayout|slideMaster)\d+\.xml.rels";
    
    		public PreOpenXmlProcessor() {
    		}
    
    		public void Process(Package package) {
    			ProcessBasedRelationshipsUris(package);
    		}
    
    		private void ProcessBasedRelationshipsUris(Package package) {
    			bool needToFlushPackage = false;
    			foreach (PackagePart part in package.GetParts()) {
    				if (IsRelationshipsFile(part.Uri.OriginalString)) {
    					XmlDocument document = new XmlDocument();
    					bool needToSaveDocument = FindAndReplaceBadRelationshipUris(document, part);
    					needToFlushPackage = needToFlushPackage || needToSaveDocument;
    					if (needToSaveDocument) {
    						SaveXmlDocument(document, part);
    					}
    				}
    			}
    
    			if (needToFlushPackage) {
    				package.Flush();
    			}
    		}
    
    		private void SaveXmlDocument(XmlDocument document, PackagePart part) {
    			using (Stream stream = part.GetStream(FileMode.Create, FileAccess.Write)) {
    				document.Save(stream);
    			}
    		}
    
    		private bool FindAndReplaceBadRelationshipUris(XmlDocument document, PackagePart part) {
    			bool foundBadRelationships = false;
    			
    			using (Stream stream = part.GetStream(FileMode.Open, FileAccess.ReadWrite)) {
    				document.Load(stream);
    				XmlNodeList nodes = document.GetElementsByTagName("Relationship");
    
    				foreach (XmlNode node in nodes) {
    					if (!IsUriValid(node)) {
    						ReplaceWithDummyRelationshipUri(node);
    						foundBadRelationships = true;
    					}
    				}
    			}
    
    			return foundBadRelationships;
    		}
    
    		private bool IsRelationshipsFile(string filename) {
    			return Regex.IsMatch(filename, SlideXmlRelsRegexString);
    		}
    
    		private bool IsUriValid(XmlNode node) {
    			XmlAttribute targetAttr = node.Attributes["Target"];
    			Uri uri;
    			if (!Uri.TryCreate(targetAttr.Value, UriKind.RelativeOrAbsolute, out uri)) {
    				return false;
    			}
    
    			return true;
    		}
    
    		private void ReplaceWithDummyRelationshipUri(XmlNode node) {
    			XmlAttribute targetAttr = node.Attributes["Target"];
    			targetAttr.Value = "about:blank";
    		}
    	}

    • Marked as answer by Michal Mracka Tuesday, July 2, 2013 8:32 AM
    Monday, July 1, 2013 2:35 PM
  • OpenXml namespace accepts URIs that are valid as a .NET URI. Please make sure your URIs are conforming to these specs (notice the Remarks section at the end):

    http://msdn.microsoft.com/en-us/library/system.uri.aspx

    Another way to test the validity is to write a small C# program and feed your URI string in the System.URI constructor to see if it passes the test.

    Monday, July 1, 2013 11:28 PM
  • to Alexandr Shakhov:

    Works nicely. Thank you.

    • Edited by Michal Mracka Tuesday, July 2, 2013 9:40 AM linear tpic display
    Tuesday, July 2, 2013 8:34 AM
  • to Enamul Kh-MSFT

    I am not creating the document, just processing it. 

    The OpenXML validation kind of defeats its own purpose when it can not handle invalid files...

    • Edited by Michal Mracka Tuesday, July 2, 2013 9:41 AM linear topic display
    Tuesday, July 2, 2013 8:43 AM