2
votes

I'm trying to extract all the embedded Excel files in a Word docx. I know that I can do this quickly by changing filename of the docx to zip, and then find all the Excel files in word/embeddings folder.

The problem is that the Excel files in word/embeddings have very generic filenames (e.g., Microsoft_Excel_Macro-Enabled_Worksheet1.xlsm, Microsoft_Excel_Macro-Enabled_Worksheet2.xlsm) instead of the original filename. The original files, however, were actually csv not xlsm files.

I did not play any part in creating this document. I assume they used Insert -> Object -> Create From File. Also, I'm using Word 2010, but according to the Compatibility Checker the document was originally created with Word 97-2003, and the files were embedded later with Word 2007.

I can find the original filename if I go into the docx, right click on embedded file icon, and go to Macro-Enabled Worksheet Object -> Convert... -> Change Icon... -> Caption

However, I don't want to do this manually for a bunch of embedded Excel files.

So is there a way to use some vba code to extract a list of the original filenames for all embedded files? Then I can use this list as a key to rename the generically named files.

2
Whenever you want to do something new in VBA try hitting record button on the lower left corner and record what you need to do. Later come back and edit the code for having loops over the whole document. If you provide a reproducible code by the method above I can help you with the second part.M--
Yes I tried to do that, but when I'm recording a macro in Word, it won't even let me right click the embedded file icon to get the caption. I have a lot of experience with Excel macros, but not Word.Korean_Of_the_Mountain
Yeah. I was trying to do so and realized that. I have an idea which may work for you if you know batch scripting. You can change the extension to *.zip. Then open the zip file and navigate to document.xml.rels file. Read every line that has .xls in it and write it into a text file. That's what I would do if I had to. Couple of notes if you decide to go with this option: 1. 7zip can read inside of a zip file. 2. You can find plenty of simple examples here for scripting.M--
I just looked at document.xml.rels and it doesn't look like it gives me the Caption info either. It has some Relationship Id instead. Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/package" Target="embeddings/Microsoft_Excel_Macro-Enabled_Worksheet68.xlsm"/><Relationship Id="rId324"Korean_Of_the_Mountain
I think it may require a vba answer if it is possible to extract this Caption information through vba since I can't get it from the xml files.Korean_Of_the_Mountain

2 Answers

1
votes

Using vba I can get my code to print the caption of the selected file into immediate window (you can write it anywhere else). This is my code:

 Sub Caption_Ex()

     If Selection.Type = wdSelectionShape Then
         Selection.ShapeRange(1).ConvertToInlineShape.Select
     End If

     Debug.Print Selection.InlineShapes(1).OLEFormat.IconLabel

 End Sub

This is what @user1964692 had done for the whole document and I am including this in my edit for reference:

 Option Explicit

 Dim num As Integer
 Dim AD As Document
 Dim ctr As Integer
 Dim caption_names() as variant
 Dim numObjects As Integer

 Sub Extract()

 Set AD = ActiveDocument

 numObjects = AD.InlineShapes.Count
        ctr = 1

 For num = 1 To numObjects
     If AD.InlineShapes(num).Type = 1 Then
         'it's an embedded OLE type so open it.
         Redim Preserve caption_names(1 to ctr)
         caption_names(ctr) = AD.InlineShapes(num).OLEFormat.IconLabel
         ctr=ctr+1
     End If
 Next num

 End Sub

This is the solution that I would use by scripting:

Navigate to the folder that you *.docx is located and open cmd there. Make a copy of the word document with *.zip extension.

xcopy Doc1.docx *.zip

Then use 7zip for extracting the file in cmd. You should have the 7za.exe in the same folder as your document.

7za.exe x Doc1.zip -o *.xml.rels -r

xcopy document.xml.rels *.txt

Later you can search for the lines which have .xls in them (Assuming you are in C drive, change the path appropriately):

powershell Command "select-string -path "C:\document.txt" -Pattern ".xls" | select line | out-file C:\lines.txt -append"

You will find filename, line number and the whole line including the match (i.e .xls) in the lines.txt file. This will give you name of the files that you are looking for.

1
votes

Here is what I ended up doing:

Sub Extract()

Dim num As Integer
Dim AD As Document
Set AD = ActiveDocument

Dim numObjects As Integer
numObjects = AD.InlineShapes.Count

Dim caption_names() as variant
ctr = 1
For num = 1 To numObjects
    If AD.InlineShapes(num).Type = 1 Then
        'it's an embedded OLE type so open it.
        Redim Preserve caption_names(1 to ctr)
        caption_names(ctr) = AD.InlineShapes(num).OLEFormat.IconLabel
        ctr=ctr+1
    End If
Next num

End Sub