I am trying to extract info on all embedded documents that reside within a WORD file.
Specifically I am trying to determine the document name and at what point in the document they are embedded. For example if I read-in a Paragraph and there is an embedded document placed right after the paragraph I need to be able to be able to detect
that so I can then assume that that embedded document is related to that paragraph.
I saw a similar post here but there were no replies... I am hoping I will be able to get one for my issue!!!
1. Create an empty document(Document A) and embed another document(Document B) in it.
2. Change Document A's extension name to .zip and unpack it.
3. Go into the extracted folder. Open "word" -> "embeddings", you'll see Document B but it's name has been changed to Microsoft_Word_Document1.docx.
4. In "word" -> "media", you can see Document B's icon.
I've traversed all folders and xml files in the extracted folder, but I cannot find a xml file which contains Document B's name. Seems that some info of the embedded document has been abandoned by Word.