none
Open XML SDK does not correctly recognize Word doc created in Word RRS feed

  • Question

  • Trying to use the Open XML SDK I'm getting the error:

    "The specified package is invalid. The main part is missing."

    I'm trying to look at Word documents that were created in the Word application. I'm seeing this with Word 2007 and Word 2010 documents created on two different machines.

    Researching turned up that the root cause is that Sysem.IO.Packaging isn't picking up valid content. I confirm that with the following code. The first MessageBox.Show tells me that System.IO.Packaging finds no "Parts".

            private void btnOpenDocument_Click(object sender, EventArgs e)
            {
                    System.IO.Packaging.Package p = null;
                    try
                    {
                        p = System.IO.Packaging.Package.Open(openXMLFileName, System.IO.FileMode.Open, System.IO.FileAccess.Read);
                        System.IO.Packaging.PackagePartCollection parts = p.GetParts();
                        if (parts.Count() <= 0)
                        {
                            //Falls das Zip-Paket keine [Content_Types].xml oder rels/.rels Dateien enthält,
                            //hat die Zip-Datei keine vom System.IO.Packaging erkannten Teile und ist demzufolge kein Office-Dokument.
                            MessageBox.Show("Das Paket enthält keine Teile.");
                            p.Close();
                            return;
                        }
                        p.Close();
                    }
                catch (Exception ex)
                    { MessageBox.Show(ex.Message); }
                using (WordprocessingDocument doc = WordprocessingDocument.Open(openXMLFileName, true))
                {    
                    MainDocumentPart docPart = doc.MainDocumentPart;
                    MessageBox.Show(docPart.ContentType.ToString());
            //        IEnumerable<FormFieldName> ffNames = docPart.Document.Body.Elements<FormFieldName>();
            //        foreach (FormFieldName ffName in ffNames)
            //        {
            //            if (ffName.Val == "Text28")
            //            {
            //                FormFieldData ffData = (FormFieldData) ffName.Parent;
            //                foreach (OpenXmlElement d in ffData.Descendants())
            //                {
            //                    MessageBox.Show(d.LocalName);
            //                }
            //            }
            //        }
                }
            }
    //
    //

    I find only one file NOT causing the error. That opens correctly with the using statement and the second MessageBox.Show correctly displays the ContentType of a Word document main part.

    I've uploaded two sample documents to Skydrive. Test.docx works correctly; Safe mode doc.docx exhibits the problem. (This latter was generated with Word started in Safe Mode in order to ensure that a damaged Normal.dotm is not the source of the difficulty.)

    When comparing the two documents' ZIP content the only difference I can see is that Text.docx has every part duplicated as a "Glossary". None of the other documents have this.

    What's causing the problem?


    Cindy Meister, VSTO/Word MVP, my blog

    Sunday, September 9, 2012 4:55 PM
    Moderator

Answers

  • Hi again, Tom

    OK, I think I'm closer to tracking down the issue. It appears to be the "permissions" set on the folder.

    Apparently, I'm not the "owner" of all the folders on my system down at the level where this code is running. If I select files from "My Documents" there's no problem. But from C\Test there is a problem. If I "share" the folder and grant permissions to "Everybody", then the code works.

    Now I just need to find out how this kind of thing is supposed to be handled and, on a system with no "domain", how to set permissions for "everybody" (if that's possible).

    As I've seen a number of questions about the error message I was getting that never were marked as "Answered", I hope this information may also help the next person who falls into this trap :-)


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, September 10, 2012 2:28 PM
    Moderator

All replies

  • Hi Cindy,

    Thanks for posting in the MSDN Forum.

    I tried to reproduce your issue on my side via following snippet:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using log4net;
    using System.Windows.Forms;
    using DocumentFormat.OpenXml.Packaging;
    using System.IO.Packaging;
    using System.IO;
    
    namespace ConsoleApplication3
    {
        class Program
        {
            [STAThread]
            static void Main(string[] args)
            {
                ILog log = log4net.LogManager.GetLogger(typeof(Program));
                OpenFileDialog OFD = new OpenFileDialog();
                OFD.Filter = "Document|*.docx;*domx";
                OFD.Multiselect = true;
                OFD.ShowDialog();
                string[] Paths = OFD.FileNames;
                foreach (string Path in Paths)
                {
                    log.Info("Try open " + Path);
                    try
                    {
                        using (WordprocessingDocument wpd = WordprocessingDocument
                            .Open(Path, false))
                        {
                            MainDocumentPart mdp = wpd.MainDocumentPart;
                            if (mdp != null)
                            {
                                log.Info("Can get Main Part");
                            }
                        }
                    }
                    catch (Exception ex)
                    {
                        log.Fatal(ex);
                    }
                }
                log.Fatal("======================================================");
                foreach (string test in Paths)
                {
                    Package p = null;
                    try
                    {
                        p = Package.Open(test, FileMode.Open, FileAccess.Read);
                        PackagePartCollection ppc = p.GetParts();
                        if (ppc.Count() <= 0)
                            log.Fatal("Can't access Part!");
                        else
                            log.Info("Get Part");
                        p.Close();
                    }
                    catch (Exception ex)
                    {
                        log.Fatal(ex);
                    }
                }
                Console.ReadKey();
            }
        }
    }

    and I get the following log:

    [Header]
    2012-09-10 13:57:47,093 [9] INFO  ConsoleApplication3.Program [26] Try open C:\******\Safe Mode doc.docx
    2012-09-10 13:57:47,217 [9] INFO  ConsoleApplication3.Program [35] Can get Main Part
    2012-09-10 13:57:47,226 [9] INFO  ConsoleApplication3.Program [26] Try open C:\*****\Test.docx
    2012-09-10 13:57:47,241 [9] INFO  ConsoleApplication3.Program [35] Can get Main Part
    2012-09-10 13:57:47,244 [9] INFO  ConsoleApplication3.Program [55] Get Part
    2012-09-10 13:57:47,247 [9] INFO  ConsoleApplication3.Program [55] Get Part
    [Footer]

    So I don't think there have any mistake in OpenXml SDK. Would you please tried my code on your side and to see whether it can work on your side?

    Have a good day,

    Tom

     

    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Monday, September 10, 2012 6:04 AM
    Moderator
  • Hi Tom

    I agree that there's not a problem with the Open XML SDK. If there were a problem, it would be at the underlying System.IO.Packaging level.

    But if you say that your system can successfully access both files, then the problem must be on my system. Do you have any idea why a system shouldn't be able to correctly recognize "Safe Mode doc.docx" as being a valid "ZIP Package"?

    If it were a problem with all files I could understand it, but the problem appears with only certain files.

    I'm running Windows Vista, VS 2008, Office 2007; the Open XML SDK 1.0 is installed, but that should have no bearing on the fact that System.IO.Packaging isn't "seeing" [ContentTypes].xml in Safe Mode doc.docx?


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, September 10, 2012 7:59 AM
    Moderator
  • Hi again, Tom

    OK, I think I'm closer to tracking down the issue. It appears to be the "permissions" set on the folder.

    Apparently, I'm not the "owner" of all the folders on my system down at the level where this code is running. If I select files from "My Documents" there's no problem. But from C\Test there is a problem. If I "share" the folder and grant permissions to "Everybody", then the code works.

    Now I just need to find out how this kind of thing is supposed to be handled and, on a system with no "domain", how to set permissions for "everybody" (if that's possible).

    As I've seen a number of questions about the error message I was getting that never were marked as "Answered", I hope this information may also help the next person who falls into this trap :-)


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, September 10, 2012 2:28 PM
    Moderator
  • Hi Cindy,

    Thanks for share your experience here, it's beneficial for other community members who have similar issue to see how you solved your problem.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Tuesday, September 11, 2012 1:41 AM
    Moderator
  • Yes. After quite a bit of debugging realized its permission related issue. Solution works fine in stage machine but not in production since live site is accessed through VPN. Ofcourse VPN account doesnt have enough rights on local files.


    Regards, karthik.

    Friday, August 9, 2013 1:41 PM