none
Using InlineShape to save a PDF attachment? RRS feed

  • Question

  • Hi

    I have a word document with a collection of embedded objects and I want to be able to identify and extract the contents if it's a PDF

    I've managed to achieve this with Aspose but it's pretty expensive and this is just a personal project.

    I have so far failed to find any way to cast the InlineShape I'm after into an object I could then save

    Can someone please help?

    Friday, August 25, 2017 11:31 PM

Answers

  • I solved the problem for myself, rather annoyingly it was childishly simple

    As I said I already knew when I had reached a PDF because I could read the ClassType

    What I hadn't realised was that using OleFormat.ConvertTo() using the same ClassType achieved precisely what I wanted, discarding the embedded object and in its place putting the PDF contents

    • Proposed as answer by macropodMVP Thursday, August 31, 2017 10:16 PM
    • Marked as answer by Journeyman-UK Friday, September 1, 2017 7:10 AM
    Thursday, August 31, 2017 5:24 PM

All replies

  • Hi Journeyman-UK,
    Word object model does not provide any method/property to verify the embedded object is a PDF file. However, you could select an embedded PDF file and use Ctrl+C to copy it and paste into a folder manually. Hope this could help you.
    Best Regards,
    Terry
    Monday, August 28, 2017 10:54 AM
  • Are you sure? Cos I can iterate through the inlineshapes collection and read the class type which tells me for sure that it is a Adobe Acrobat file - I just can't then cast it to an object to save it accordingly
    Monday, August 28, 2017 11:07 AM
  • Terry is correct about the Word Object Model. However, for a workaround, see: https://answers.microsoft.com/en-us/office/forum/office_2007-customize/how-to-extract-embedded-files-from-word-document/6a92bd9a-1ceb-4cf8-bd6f-ebad955cbdc1

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Monday, August 28, 2017 11:11 AM
  • Hi Journeyman-UK,

    See, use Ctrl+V paste it to folder.

    Best Regards,

    Terry

    Monday, August 28, 2017 11:14 AM
  • Which part of that thread are you referring to? Cos I am pretty sure I read it already and I am trying to automate the process
    Monday, August 28, 2017 11:14 AM
  • I guess I didn't explain what I want to do well enough

    I want to automatically open the PDF, take its contents into memory, and paste them in the appendix of the enclosing word file

    As I mentioned above, I can already identify the PDF from InlineShapes via the ClassType

    Monday, August 28, 2017 11:19 AM
  • Hi Joruneyman-UK,

    Here is the code works for me.

    Sub SavePDFFIle()
    Dim Ishp As InlineShape
    For Each Ishp In ActiveDocument.InlineShapes
    If Ishp.OLEFormat.ClassType = "Package" Then
    Ishp.Range.Copy
    CreateObject("Shell.Application").Namespace("C:\Users\v-guaxu\Desktop\OutFolder\").Self.InvokeVerb "Paste"
    End If
    Next Ishp
    End Sub

    Please refer to below link

    Copy & paste embeded object to file system  

    Besides, I used ClassType="Package" to identify the PDF, but in fact a txt or a zip file could also be "Package". If you have a better way to identify the PDF, i suggest you share us your solution. Thanks for understanding.

    Best Regards,

    Terry


    Tuesday, August 29, 2017 10:24 AM
  • Hi Terry

    Thanks for your continued help. Here is an example of my code

    Word.InlineShapes collection = Globals.ThisAddIn.Application.ActiveDocument.InlineShapes;

    if (collection[1].OLEFormat.ClassType == "AcroExch.Document.DC"

    {

    // do something with the file

    }

    Except so far I haven't been able to do anything with the shape, even know I know what it is!

    Tuesday, August 29, 2017 11:54 AM
  • Assuming you're using Word 2013 or later, you may be able to open the embedded object with Word itself; otherwise you'd need to have Adobe Acrobat Pro installed and automate that. For some code that does something analogous with embedded Excel worksheets, see:
    Convert Embedded Excel Sheets to Word Tables

    Cheers
    Paul Edstein
    [MS MVP - Word]


    • Edited by macropodMVP Wednesday, August 30, 2017 3:41 AM
    Wednesday, August 30, 2017 3:41 AM
  • I've seen lots of examples use something 'similar', e.g. images are very common

    The closest I've come to success was using a byte array and EnhMetaFileBits, but when I saved it the header was incorrect, which I guess is not surprising since I believe that process is meant for images

    I am using 2013 - what do you mean by open  ...?

    Wednesday, August 30, 2017 5:35 AM
  • I've seen lots of examples use something 'similar', e.g. images are very common

    ...

    I am using 2013 - what do you mean by open  ...?

    Image processing is nothing like embedded OLE object processing, which is why I posted a link for the latter.

    Since Word 2013 can open PDF files natively, you might be able to use it instead of Adobe Acrobat Pro for the content extraction; otherwise you won't be able to do it without Adobe Acrobat Pro - which you will need to learn the API for.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Wednesday, August 30, 2017 6:10 AM
  • I'm not really worried about the content extraction, I can use an extension like iTextSharp for that

    My problem is how to cast the InlineShape object to something that I can then export

    I read through your example, I'll have a go at porting it to C# and see if I get any different results to what I've had already

    Wednesday, August 30, 2017 7:27 AM
  • Still not made any progress on this I'm afraid, your example wasn't able to help me, for example

                    Word.InlineShapes collection = Globals.ThisAddIn.Application.ActiveDocument.InlineShapes;

                    collection[1].OLEFormat.Activate();

    That'll open the PDF, but then I have got no way of reading the active window, since it wasn't Word that opened the document

    Every attempt to cast it to a type I can do something with

                    Word.InlineShapes collection = Globals.ThisAddIn.Application.ActiveDocument.InlineShapes;

                    Object pdftext =  (Object)collection[1].OLEFormat.Object;

    Above results in 'Specified cast is not valid'

    Wednesday, August 30, 2017 3:30 PM
  • I solved the problem for myself, rather annoyingly it was childishly simple

    As I said I already knew when I had reached a PDF because I could read the ClassType

    What I hadn't realised was that using OleFormat.ConvertTo() using the same ClassType achieved precisely what I wanted, discarding the embedded object and in its place putting the PDF contents

    • Proposed as answer by macropodMVP Thursday, August 31, 2017 10:16 PM
    • Marked as answer by Journeyman-UK Friday, September 1, 2017 7:10 AM
    Thursday, August 31, 2017 5:24 PM
  • Hi Journeyman-UK,

    I'm glad to hear that your issue has been solved. I suggest you mark your solution as answer to help other developers use this forum efficiently. Thanks for understanding.

    Best Regards,

    Terry

    Friday, September 1, 2017 5:44 AM