none
Reading Custom properties created in Sharepoint RRS feed

  • Question

  • I've done some serious googling to find out how to read properties of a word document and finally I found this post: http://social.msdn.microsoft.com/forums/en-US/vsto/thread/11f1fd8b-0ee0-4f3b-8a7e-f5dc92cf48d5/

    I modified that piece of code to a property-reader:

    public static class Word2007CustomPropertyReader
        {
            public static string ReadProperty(CustomXMLParts customXmlParts, string field)
            {
                CustomXMLPart xmlPart = null;
                CustomXMLNode xmlNode = null;
     
                foreach (CustomXMLPart part in customXmlParts)
                {
                    if (part.DocumentElement.BaseName == _wssPropertiesRootElementName)
                    {
                        xmlPart = part;
                        break;
                    }
                }
     
                if (xmlPart != null)
                {
                    xmlNode = xmlPart.SelectSingleNode(_documentManagementXpath);
                }
     
                if (xmlNode != null)
                {
                    foreach (CustomXMLNode node in xmlNode.ChildNodes)
                    {
                        string key = node.BaseName;
                        if (key == field)
                        {
                            return node.Text;
                        }
                     }
                }
     
                return string.Empty;
            }
     
            private static readonly string _wssPropertiesRootElementName = "properties";
            private static readonly string _documentManagementXpath = "//documentManagement";
        }

    To read a property from my winform I have to run an instance of Microsoft.Office.Interop.Word.Application and create an instance of Microsoft.Office.Interop.Word.Document and retrieve CustomXMLParts.

    To me this feels kind of awkward. Is there some other way to get to CustomXMLParts? Or some other way to read the Sharepoint created properties?

    Tuesday, April 5, 2011 1:48 PM

All replies

  • Hello EckePecke,

    Here are some resources to consult about both CustomXMLParts and seeing the individual parts of an OpenXML file.

    In the following article "Document Information Panel and Document Properties" at
    http://msdn.microsoft.com/en-us/library/bb447589.aspx

    See this discusion.
    Custom Document Properties
    --------------------------------------------------------------------------------
    User-defined properties are contained in the Custom File Properties part of the Open XML Formats. For documents stored in a SharePoint Foundation library, this part contains a custom property that specifies the content type ID of the content type assigned to the document, as in the following example.

    Copy
     <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Properties
        xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
        xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
      <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="2" name="ContentTypeId">
        <vt:lpwstr>0x01010042D2ECEB487FD14A878A8B12B45AD0DF</vt:lpwstr>
      </property>
    </Properties>

    The content type ID property is never promoted from the document to the document library in which it resides.

     Note 
    In Office 2010 documents saved in a binary format, such as .doc, all custom properties, including those that would map to SharePoint Foundation columns, are stored in the same location. You cannot bind SharePoint Foundation columns to document properties in Office 2010 documents saved as binary formats. If you save a binary file in Open XML Formats in SharePoint Foundation, SharePoint Foundation attempts to rationalize the properties present in the document. If the document contains a custom property that has the same name and data type as a column in the SharePoint Foundation document library to which it is being saved, then SharePoint Foundation assumes the two properties are the same and relocates the document property within the Open XML Formats accordingly. However, this rationalization of document properties is not performed on files saved in the Open XML Formats from the start.

    End of that discussion

    See the blog "Microsoft Office SharePoint and related" at
    http://blogs.msdn.com/b/joerg_sinemus/archive/2007/05/25/migrate-office-2003-documents-to-office-2007-documents-and-what-is-happen-with-the-old-doc-properties.aspx
    for the discussion "What I Can Do?" WSS properties are...stored as custom XML part within an Office 2007 ( & 2010 ) OpenXML file.

    Also see "Using the SharePoint Foundation 2010 Managed Client Object Model with the Open XML  SDK 2.0 at
    http://msdn.microft.com/en-us/library/ee956524(office.14).aspx

     Also see "Retrieving Content from Different Parts: Explicit or Implicit Relationships in the Open XML SDK 2.0 for Microsoft Office"
    at:
    http://msdn.microsoft.com/en-us/library/ee413542(office.12).aspx
     


    Chris Jensen
    Senior Support Technical Lead
    Thursday, April 7, 2011 6:15 PM
    Moderator
  • Some great info here Chris, thank you. 

    My colleagues & I are currently facing the same challenges, using SharePoint 2010 alongside a custom .NET application that needs to read the ContentType properties.  A real eye-opener finding out the way these properties are stored, and I'm now very wary of our users importing binary Office 2003 style documents into SharePoint!

    EckePecke, if I understand your original question correctly, you feel the code you used previously was awkward mainly because it forced you to use COM interop via the Office PIAs?

    As Chris' post suggests, the OpenXml 2.0 API is the way forward here.  I don't have much knowledge of it but have been able to convert your code to use OpenXML and LinqToXml to remove the need for COM interop.  It still isn't perfect in my opinion as my code comments suggest - I'd like to be able to identify the SharePoint metadata CustomXmlPart without having to read and inspect the xml stream.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using DocumentFormat.OpenXml.Packaging;
    using System.IO;
    using System.Xml.Linq;
    
    namespace SharePointDocumentPropertiesTest
    {
        public static class OpenXMLCustomPropertyReader
        {
            // note the lower-case ml in the Xml of CustomXmlPart from the OpenXML .NET library DocumentFormat.OpenXml.Packaging 
            // versus the CAPS of CustomXMLPart from the COM library Microsoft.Office.Core
            public static string ReadProperty(IEnumerable<CustomXmlPart> customXmlParts, string field)
            {   
                // Use System.Xml.Linq XElement class insted of COM specific stuff
                XElement root = null;
                XElement docManagementElement = null;
    
                foreach (CustomXmlPart part in customXmlParts)
                {
                    // bit painful this... do we really have to manually stream the contents of each CustomXml file?
                    // my lack of experience with OpenXML means I don't know if there's a better way.
                    using (StreamReader reader = new StreamReader(part.GetStream()))
                    {
                        root = XElement.Load(reader);
                    }
    
                    // check if this part has a root xml element name & namespace that indicates it contains 
                    // SharePoint content type props
                    if (root.Name.LocalName == _wssPropertiesRootElementName
                        && root.Name.NamespaceName == _wssPropertiesNamespace)
                    {
                        break;
                    }
                    else
                        root = null;
                }
    
                if (root != null)
                {
                    docManagementElement = root.Elements(_documentManagementElementName).FirstOrDefault();
                }
    
                if (docManagementElement != null)
                {
                    // So, we're in the right CustomXmlPart, we've got a DocumentManagement element, ...
                    // now let's use a bit of LinqToXml + lamda expression to get the child element of the property we want
                    XElement propertyElement = docManagementElement
                                               .Elements()
                                               .FirstOrDefault(e => e.Name.LocalName == field);
                    if (propertyElement != null)
                        return propertyElement.Value;
                }
    
                return string.Empty;
            }
    
            private static readonly string _wssPropertiesRootElementName = "properties";
            private static readonly string _documentManagementElementName = "documentManagement";
            private static readonly string _wssPropertiesNamespace = "http://schemas.microsoft.com/office/2006/metadata/properties";
        }
    }
    
    


    Note for the above code to build you need assembly references to DocumentFormat.OpenXml.dll and (perhaps) WindowsBase.dll as per all OpenXml development (see Beth Massi's excellent blog for advice )

    You need to pass the ReadProperty method a collection of OpenXml style CustomXmlParts as in the example below:

                using (PresentationDocument openXmlDoc = PresentationDocument.Open(filename, false))
                {
                    PresentationPart pp = openXmlDoc.PresentationPart;
                    string BookType = OpenXMLCustomPropertyReader.ReadProperty(pp.CustomXmlParts, "BookType");
                    Console.WriteLine(BookType);
                }
    

    In this example, PresentationDocument represents a powerpoint .pptx document, filename is a string assigned previously with the full path of the document, and the document has a SharePoint metadata column named BookType associated with it via it's content type.

    Hope this makes sense and thank you for your previous code example, it gave me a great starting point and has helped me greatly in filling in the gaps between SharePoint, COM interop and OpenXML.

    Regards,

    Gareth

    • Proposed as answer by Gareth Ward Sunday, December 4, 2011 12:14 AM
    Sunday, December 4, 2011 12:01 AM
  • Apologies I realise my previous post showed an OpenXML example using Powerpoint whereas your previous usage of custom properties was with Word.

    Here's a OpenXML 2 code snippet that might be of help:

     

                using (WordprocessingDocument openXmlDoc = WordprocessingDocument.Open(filename.ToString(), false))
                {
                    MainDocumentPart part = openXmlDoc.MainDocumentPart;
                    string BookType = OpenXMLCustomPropertyReader.ReadProperty(part.CustomXmlParts, "BookType");
                    Console.WriteLine("BookType = {0}", BookType);
                }
    


    Note that the implementation of CustomXmlParts in OpenXML is the same for all document types I think, but the different document classes have different strongly typed document parts. 

    Sunday, December 4, 2011 12:22 AM