none
Word 2010 xml validation RRS feed

All replies

  • Hello Ashok,

    It appears that you have used a <document> element with a missing attribute - implied by the reference to "Ignorable"

    You have persued the answer to this in a more general fashion in your general post to the "Open Sppecifications Developer Forum" in thread at http://social.msdn.microsoft.com/Forums/en-US/os_openXML-ecma/thread/adbe4de8-10a0-47ca-90ea-fbe5efffe947

    The moderator there, Mark Miller works with the specifics of this issue and others. The present post is to the "Microsoft Office Developers Forum," which is a served by a team that works with Word and other Office products when modified or adapted using Microsoft Visual Studio. The underlying issue here relates more to your posts to the Open Specifications Developer Forum as it pertains to validating XML

    In your last post to the Open Specifications Developer Forum you asked where you can get the complete specifications for the Open Office XML schema.


     The site http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html 
    contains the index to all of the pieces of the schema. The entries start with ISO/IEC 29500-1-2008 Electronic Inserts.

      
    The link ISO/IEC 29500-1-2008 Electronic Inserts is to the "Part 1 Fundamentals and Markup Language Reference" document.


    The next 5 entries contain links to the pieces of the standard for Microsoft Office 2008 documents and the ammendments for Microsoft Office 2010.

    Those constitute links to the complete schema and references to Microsoft Office 2008 XML and Microsoft Office 2010 XML.

    You don't say what tool you're using to validate your XML. There is an add-in for Microsoft Visual Studio 2008 named XMLSpy. You can also get a 30-day free trial of XMLSpy at http://www.altova.com/xmlspy.htm.

    That tool has contextual explanations of the problem causing the failure to validate.

    HTH



    Chris Jensen
    Tuesday, December 7, 2010 5:07 PM
    Moderator
  • Hi Chris,

    I convert the xsd's to java objects using an api "XML beans". These java objects provide an api to validate the xml schema. This is the api which I use to validate the XML. This api is compliant to the schema to which we are validating the xml because the api is generated using the word schema.

    Following are the steps am doing to validate the XML schema:

    1) Convert xsd into java objects using XML beans api and build a jar file

    2) Place the jar file in the classpath

    3) Convert the uploaded word 2010 docx file into "Document" object. Pass this document object as a parameter to the validate api.

    While validating I get the error as stated in my previous email.

    All I need is a complete consolidated set of XML schemas which I can use to build a jar file and use it to validate my word docx 2010 file.

    Please help me in this regards as I'm waiting on the set of consolidate XML schemas for a long time.

    Thanks,

    Ashok

     


    Ashok Ambrose
    Tuesday, December 7, 2010 5:27 PM
  • Ashok,

    In my first reply I said:

    Quote

    In your last post to the Open Specifications Developer Forum you asked where you can get the complete specifications for the Open Office XML schema.


     The site http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html 
    contains the index to all of the pieces of the schema. The entries start with ISO/IEC 29500-1-2008 Electronic Inserts.

      
    The link ISO/IEC 29500-1-2008 Electronic Inserts is to the "Part 1 Fundamentals and Markup Language Reference" document.


    The next 5 entries contain links to the pieces of the standard for Microsoft Office 2008 documents and the ammendments for Microsoft Office 2010.

    Those constitute links to the complete schema and references to Microsoft Office 2008 XML and Microsoft Office 2010 XML.

    end of quote.

    Repeat - "Those constitute links to the complete schema and references to Microsoft Office 2008 XML and Microsoft Office 2010 XML. "

    As a book is composed of chapters, and those constitute the complete book, the documents listed in the ISO/IEC index are equivalent to 'Chapters' which compose the 'book' for Microsoft Office Open XML.  The 'schemas' are 'chapters' in the 'book'. Your namespaces can include as many of those shemas as is appropriate to your conversion module. If you are looking for any other XML schemas you may get help from a different forum.

    Repeating what is in the earlier reply, you may need to consider some validation tool, such as XMLSpy or another tool that provides a thorough analysis of the XML in your code, related to the schema in the namespaces, and explains how the XML does not comply.

     

     


    Chris Jensen
    • Proposed as answer by cjatmsModerator Wednesday, December 8, 2010 2:39 PM
    Wednesday, December 8, 2010 2:39 PM
    Moderator
  • I am not using any custom xml which I built and validating against the iso-iec schema.

    The xml am trying to validate is the document.xml in the word 2010 docx file. Why should this validation of xml break even though I validate it with the xml schema provided by iso-iec.

    Ideally am validatiing the xml given by a microsoft product against the schema which is provided by iso-iec ( which sets standards for office xml formats).

     

    Thanks,

    Ashok


    Ashok Ambrose
    Wednesday, December 8, 2010 10:29 PM
  • Hi Ashok,

    In a simple .docx file, in the part named Document.xml, at the top where the namespaces are declared, in the long string of workspace definitions, the declarations include the following:

    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

    Another declaration is for compatability, thus:

    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

    At the very end of the string the Ignorable attribute is declared thus:

    mc:Ignorable="w14 wp14"

    That refers to the mc (compatability) namespace

    You may have the following pairing, which would fail validation:

    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    w:Ignorable="w14"

    The value of the attibute may be different, but whatever it is, it would fail validation.


    Chris Jensen
    Thursday, December 9, 2010 5:29 PM
    Moderator
  • Hi Chris,

    I am having the following pairing

    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

    mc:Ignorable="w14 wp14"

     

    I downloaded xml spy trial version, validated the xml and it gives the following error:

    cvc-complex-type.3.2.1: Complex type definition 'w:CT_Document' of element <w:document> does not allow attribute 'mc:Ignorable' and no attribute wildcard matches it.

    Can you help me in amending the xsd file so that it passes the validation check.

    Thanks,

    Ashok


    Ashok Ambrose
    Thursday, December 9, 2010 5:56 PM
  • Hi Ashok,

    It's good to hear that you have downloaded and are using XMLSpy.

    You don't amend the ISO standards. They are standards.

    You can copy the XSD of one and rename it, then refer to the renamed one in your document declarations.

    To add a Ignorable attribute you can look at the XSD statement for the xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" to see how the ignorable attribute is defined. Use that as a model for modifying the 'document' element in the CT_Document definition in your custom copy of the xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" to include the attirbute definition for ignorable.

    You may get some insight on the document you're converting by looking at the document.xml part of a very simple Word 2007 or Word 2010 document. In Microsoft Word create a new document with one line of text at the top. Save that with a distinctive name to your desktop.
     
    Office 2007 and later are 'packages' of various xml, rel, and other files. You can see the contents of the components by unpackaging the .docx file.
    Next, rename that document to change its extension from .docx to .zip.
    Un-package the components by extracting them from the .zip using winzip or any other extraction tool.

    A sub-package named "Word" within the zip contains the 'document.xml' file - you can examine that using Notepad or any XML viewer. The namespace definitions are at the start of the file.

    By comparing the definitions in your document to the definitions in the simple document you may see the source of the problem.

    A couple of content items that may be of assistance and/or interest are below. 

    From MSDN
    Walkthrough: Word 2007 XML Format
    http://msdn.microsoft.com/en-us/library/bb266220(office.12).aspx?info=EXLINK

    There is a great article on creating .docx files using .NET
    http://blogs.technet.com/b/migreene/archive/2006/04/15/425331.aspx
    This has a link to Creating an Open XML Document in .NET at the site "Open XML Developers' Org" pages

     


    Chris Jensen
    Friday, December 10, 2010 4:07 PM
    Moderator
  • Hi Chris,

    Can I get a xsd file which is accomodates the Ignorable attribute ? 

    Thanks,

    Ashok


    Ashok Ambrose
    Friday, December 10, 2010 6:10 PM