none
Open XML SDK 2.5 - Excel - questions on 'internal" XML files - indentation, Byte Order Marks (BOM) RRS feed

  • Question

  • This post is strictly for making comparison purposes between Excel files, other than for that I don't care about the "indentation" or BOM characters.

    Right now in order to compare the "original" Excel files that we currently produce (from macros) to the Excel files that I'm creating with the Open XML SDK 2.5, I'm doing the following steps:

    • Rename the existing and new Excel files from .xlsx to .zip.
    • Extracting the contents of each zip file into a new folder.
    • Open each XML file in each folder using XMLSpy, do a "pretty print", and save it.  If I don't do this step, it's pointless to do the compares.
    • Use WinDiff.exe to compare the two folders.

    The other thing that I have to do to the OpenXML generated xml files is remove the "namespace prefix" e.g. the "x:" from the tags.  As an example, in the 'sheet.xml' file it creates: xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"

    So the tags end up as <x:cols>.  The macro created ones do not have that "x:" prefix.

    Another issue is that the macro generated xml files are theme1.xml, sheet1.xml, but the OpenXML ones are not named this way so I have to name them identically.  I either would like to have the macro ones NOT do this or have the OpenXML generated ones DO this.

    [EDIT: if I open the generated Excel file in Excel, make no changes, and "save" it strips out the namespace prefixes and renames the files.  still a pain, but doubt there is a way to control this otherwise.]

    The above steps work fine, other than it is labor intensive and I have 80 original files to do!

    I know with the XmlWriter and other classes, there is control over the format of the xml.  I don't have the desire to switch my code to doing it that way just for the sake of comparisons.

    But I don't see any formatting control (or other ways) in the SpreadsheetDocument class or other classes in the SDK.

    Can someone confirm that this is correct or offer some alternatives for making these comparisons easier?

    Thanks.





    Wednesday, June 18, 2014 12:49 PM

Answers

  • Hi,

    To make Office automation with Open XML SDK easier, you could use Open XML SDK Productivity Tool for Microsoft Office (OpenXmlSdkTool.exe). It is under the installation directory (e.g. D:\Program Files (x86)\Open XML SDK\V2.5\tool).

    The Open XML SDK 2.5 Productivity Tool for Microsoft Office provides a number of features designed to improve your productivity and accelerate your learning while working with the SDK and Open XML files. Features include the ability to generate Open XML SDK 2.5 source code based on document content, compare source and target Open XML documents to reveal differences and to generate source code to create the target from the source, validate documents, and display documentation for the Open XML SDK 2.5 Classes, the ECMA376v1 standard, and the Microsoft Office implementation notes.

    You could make two Excel workbooks, One is the original file, another is the target file which is what you want. You could make the target file manually or through macros as you mentioned. 

    After that, you could compare these two files with Open XML SDK Productivity Tool for Microsoft Office (OpenXmlSdkTool.exe) and find the difference XML elements and related code.

    Every XML with green means that XML has been changed.

    In addition, here are some articles may give you help:

    Working with SpreadsheetML documents (Open XML SDK)

    Spreadsheets (Open XML SDK)

    Hope this helps.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Thursday, June 19, 2014 5:47 AM
    Moderator