none
Reading data from Excel Worksheet embedded inside Text Box in an Office Word 2007 .docx using C# RRS feed

  • Question

  • I am trying to read data stored in Word documents (Office 2007) using C#. Each of the documents has a Text Box object that contains an Excel 97-2003 Worksheet Object that contains data I need to access. These worksheets, however, don't show up in the document's StoryRanges, Shapes, or InlineShapes collections, because they are embedded in the Text Box Shape.

    If I open the document in Word, manually cut/copy and paste the worksheet outside of the Text Box and save the Document, I can then see the worksheet in the document's InlineShapes collection, but I have over 7,000 documents to work with, so I need more of an automatic method.

    At the very least I need to know: is it possible to access Shapes/Inline Shapes within Shapes using the Microsoft Office Interops (Word/Excel)?

    Thanks!

    Monday, April 9, 2012 4:27 PM

Answers

  • You can use Microsoft Open XML API to read the content easily, as the documents you try to read are 2007 docs. Download the Open xml SDK productivity tool, open the code and reflect code. Where you can see how the document was created, and u can read the same way.

    Hope this helps


    --Krishna

    • Marked as answer by ujg Tuesday, April 10, 2012 7:58 PM
    Monday, April 9, 2012 8:20 PM

All replies

  • You can use Microsoft Open XML API to read the content easily, as the documents you try to read are 2007 docs. Download the Open xml SDK productivity tool, open the code and reflect code. Where you can see how the document was created, and u can read the same way.

    Hope this helps


    --Krishna

    • Marked as answer by ujg Tuesday, April 10, 2012 7:58 PM
    Monday, April 9, 2012 8:20 PM
  • Thanks for pointing me towards the Open XML SDK, Krishnav. I used the SDK Tool to find the necessary package. Then I added a reference to WindowsBase.dll and used the following code to find the package part, converted that part into a stream, and streamed that into its own .xls document, which I can handle a lot easier.

    using System;
    using System.Data;
    using System.Windows.Forms;
    using System.IO.Packaging;
    using System.IO;


    	private void saveXlsFromDocx()
    	{
    		string inputFileName = @"C:\fileLocation\inputDoc.docx";
    		string outputFileName = @"C:\fileLocation\outputXLS.xls";
    		string pPartUri = "";
    
    		try
    		{
    			//Get URI of embedded XLS by checking package parts in document
    			Package wordPackage = Package.Open(inputFileName);
    			foreach (PackagePart pPart in wordPackage.GetParts())
    			{
    				//Look for PackagePart /word/embeddings/Microsoft_Office_Excel_97-2003_Worksheet1.xls
    				pPartUri = pPart.Uri.OriginalString;
    				if (pPartUri.ToLower().EndsWith(".xls"))
    				{
    					System.IO.Stream partStream = pPart.GetStream();
                        			FileStream writeStream = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
    					ReadWriteStream(partStream, writeStream);
    				}
    			}
    			wordPackage.Close();
    		}
    		catch
    		{
    		}
            }
    
            //Copy one stream to another 
            private void ReadWriteStream(Stream readStream, Stream writeStream)
            {
                int Length = 256;
                Byte[] buffer = new Byte[Length];
                int bytesRead = readStream.Read(buffer, 0, Length);
                while (bytesRead > 0)
                {
                    writeStream.Write(buffer, 0, bytesRead);
                    bytesRead = readStream.Read(buffer, 0, Length);
                }
                readStream.Close();
                writeStream.Close();
            }

    Tuesday, April 10, 2012 7:58 PM