none
c# OpenXML Example of Retrieving an embedded word doc's relationship info.

    Question

  • I am working on a custom program to extract all the tables from a word document and all the embedded word documents that are in the cells of those tables.

    So far I cannot find any examples of this. All the examples I see are for creating docs not traversing them and extracting embedded documents.

    So far  I can extract hyperlinks and their relationship info.. I would like to do the same thing for word documents that are embedded within the word document.

    Any help is greatly appreciated.

    **Note: the code I have working for extracting hyperlinks from table cells is the following:

                            cell.Descendants<Hyperlink>().ToList().ForEach(link=>{
                                if (link.Id != string.Empty)
                                {
                                    HyperlinkRelationship hr = mainDocPart.HyperlinkRelationships.Where(rel => rel.Id == link.Id).FirstOrDefault();
                                    if (hr != null)
                                    {
                                        linkTxt += linkSep + "<<<" + link.InnerText + ": " + hr.Uri.AbsoluteUri + ">>>";
                                        linkSep = " ";
                                    }
                                }
                            });





    Friday, March 08, 2013 8:55 PM

Answers

  • Hi itsDev,

    Welcome to the MSDN forum.

    You may have a look at this sample code:

    Extract embedded files from Office documents (CSOfficeDocumentFileExtractor)

     /// <summary>
            /// This event scans through the file to check if there is any files embedded in it.
            /// If there is any, it will add the name of the file in the checked list box
            /// </summary>
            /// <param name="sender"></param>
            /// <param name="e"></param>
            private void btnScan_Click(object sender, EventArgs e)
            {
                string fileName = txtSourceFile.Text; 
                if (txtSourceFile.Text == string.Empty || !System.IO.File.Exists(fileName))
                {
                    MessageBox.Show("File does not exist.", "Invalid file", MessageBoxButtons.OK, MessageBoxIcon.Error);
                    return;
                }
    
                // Open the package file
                Package pkg = Package.Open(fileName);
    
                System.IO.FileInfo fi = new System.IO.FileInfo(fileName);
                
                string extension = fi.Extension.ToLower();
               
                if ((extension == ".docx") || (extension == ".dotx") || (extension == ".docm") || (extension == ".dotm"))
                {
                    embeddingPartString = "/word/embeddings/";
                }
                else if ((extension == ".xlsx") || (extension == ".xlsm") || (extension == ".xltx") || (extension == ".xltm"))
                {
                    embeddingPartString = "/excel/embeddings/";
                }
                else
                {
                    embeddingPartString = "/ppt/embeddings/";
                }
    
                // Get the embedded files names.
                foreach(PackagePart pkgPart in pkg.GetParts())
                {
                    if (pkgPart.Uri.ToString().StartsWith(embeddingPartString))
                    {
                        string fileName1 = pkgPart.Uri.ToString().Remove(0, embeddingPartString.Length);
                        chkdLstEmbeddedFiles.Items.Add(fileName1);
                    }
                }
                pkg.Close();
                if (chkdLstEmbeddedFiles.Items.Count == 0)
                    MessageBox.Show("The file does not contain any embedded files.");
            }
    


     /// <summary>
            /// This method extracts the files to the folder mentioned in the Destination Folder text box.
            /// If the extracted file is a structured storage, it will be sent to Ole10Native.ExtractFile() method 
            /// to extract the actual contents.
            /// </summary>
            /// <param name="sender"></param>
            /// <param name="e"></param>
            private void btnExtractSelectedFiles_Click(object sender, EventArgs e)
            {
                if (string.IsNullOrWhiteSpace(txtSourceFile.Text) || string.IsNullOrWhiteSpace(txtDestinationFolder.Text))
                {
                    MessageBox.Show("The source file and destination folder cannot be empty");
                    return;
                }
                if (!File.Exists(txtSourceFile.Text))
                {
                    MessageBox.Show("The file does not exist");
                    return;
                }
                // Open the package and loop through parts 
                // Check if the part uri to find if it contains the selected items in checked list box
                Package pkg = Package.Open(txtSourceFile.Text);
                foreach (PackagePart pkgPart in pkg.GetParts())
                {
                    for (int i = 0; i < chkdLstEmbeddedFiles.CheckedItems.Count; i++)
                    {
                        object chkditem = chkdLstEmbeddedFiles.CheckedItems[i];
                    
                        if (pkgPart.Uri.ToString().Contains(embeddingPartString + chkdLstEmbeddedFiles.GetItemText(chkditem)))
                        {
                            // Get the file name
                            string fileName1 = pkgPart.Uri.ToString().Remove(0, embeddingPartString.Length);
    
                            // Get the stream from the part
                            System.IO.Stream partStream = pkgPart.GetStream();
                            string filePath = txtDestinationFolder.Text + "\\" + fileName1;
    
                            // Write the steam to the file.
                            System.IO.FileStream writeStream = new System.IO.FileStream(filePath, FileMode.Create, FileAccess.Write);
                            ReadWriteStream(pkgPart.GetStream(), writeStream);
    
                            // If the file is a structured storage file stored as a oleObjectXX.bin file
                            // Use Ole10Native class to extract the contents inside it.
                            if (fileName1.Contains("oleObject"))
                            {
                                // The Ole10Native class is defined in Ole10Native.cs file
                                Ole10Native.ExtractFile(filePath, txtDestinationFolder.Text);
                            }
                        }
                    }
                    
                }
                pkg.Close();
            }
    

    Please download it and give it a try.

    Good day.


    Yoyo Jiang[MSFT]
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, March 13, 2013 9:10 AM