0
votes

I need to loop over some word documents, and extract images from a word document and save them in a separate folder. I've tried the method of saving them as an HTML document, but it is not a good fit for my requirement.

Now, I'm looping through the images using inlineshapes object and then copy-pasting them on a publisher document and then saving them as an image. However, I'm facing a Runtime Automation error when I'm running the script. For using the Publisher runtime library I've tried both early and late binding but I'm facing the error on both of them.

Can anyone please let me know what is the problem? Also, if anyone can explain why I'm facing this error, that'd be great. As per my understanding, it is due to memory allocation, but I'm not sure.

Here is the code block that I've been working on (fp, dp are folder paths, while filename is the word document name. I'm calling this sub in another sub that is looping over all the files in a folder):

Sub test(ByVal fp As String, ByVal dp As String, ByVal filename As String)
Dim doc As Document
Dim pubdoc As New Publisher.Document
Dim shp As InlineShape
'Application.Screenupdating = False
'Dim pubdoc As Object
'Set pubdoc = CreateObject("Publisher.Document")
Set doc = Documents.Open(fp)
With doc
    i = .InlineShapes.Count
    Debug.Print i
End With
For j = 1 To i
    Set shp = doc.InlineShapes(j)
    shp.Select
    Selection.CopyAsPicture
    pubdoc.Pages(1).Shapes.Paste
    pubdoc.Pages(1).Shapes(1).SaveAsPicture (dp & Application.PathSeparator & j & ".jpg")
    pubdoc.Pages(1).Shapes(1).Delete
Next
doc.Close (wdDoNotSaveChanges)
pubdoc.Close
'Application.Screenupdating = True

End Sub

Apart from this, if anyone has any suggestions to make this faster, I'm all ears. Thanks in advance!

3

3 Answers

1
votes

Just add .zip to the end of the file name, expand the file and look in the word/media folder. All the files will be there, no programming necessary.

0
votes

Extracting the pictures from a Filtered HTML document that was created from your original source document would be faster. However, you said that was not a good fit for you needs so ... here is example code that will locate pictures in your source document and paste them into a second document.

The speed problem of this type of code is caused by the CopyPicture working from a Selection command, so I recommend using a range instead. Of course the For/Next loop that is required is slower no matter what.

Sub CopyPasteAsPicture()
    Dim doc As Word.Document, iShp As Word.InlineShape, shp As Word.Shape
    Dim i As Integer, nDoc As Word.Document, rng As Word.Range
    Set doc = ActiveDocument
    
    If doc.Shapes.Count > 0 Then
        For i = 1 To doc.Shapes.Count
            Set shp = doc.Shapes(i)
            If shp.Type = msoLinkedPicture Or shp.Type = msoPicture Then
                'if you want only pictures extracted then you have
                'to specify the type
                shp.ConvertToInlineShape
                'if you want all extracted pictures to be in the sequence
                'they appear in the document then you have to convert
                'floating shapes to inline shapes
            End If
        Next
    End If
    
    If doc.Content.InlineShapes.Count > 0 Then
        Set nDoc = Word.Documents.Add
        Set rng = nDoc.Content
        For i = 1 To doc.Content.InlineShapes.Count
            doc.Content.InlineShapes(i).Range.CopyAsPicture
            rng.Paste
            rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
            rng.Paragraphs.Add
            rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
        Next
    End If
End Sub

If you want to place all shapes (floating or inline) into a folder as image files, then the best way is to save the source document as a filtered HTML document. Here is the command:

htmDoc.SaveAs2 FileName:=LGPWorking & strFileName, AddToRecentFiles:=False, FileFormat:=Word.WdSaveFormat.wdFormatFilteredHTML

In the above the active document is assigned to the variable htmDoc. I am giving this new document a specific name and location. The output from this is not only the HTML file but also a directory by the same name with an appended "_Files" label. In the "x_Files" directory are all the image files.

If you only want selective images pulled from your original source document, or if you want images pulled from multiple source documents ... then you need to use the above code that I shared for placing only the images you want from one or more source document into a new Word document and then save that new document as an Filtered HTML.

When your routine is done, you can Kill the HTML document and only leave the Files directory.

0
votes

I had to change a few things around, but this will allow to save a single image on a word document and go through a couple of cycles before it turns into a jpg on the other side, without any white space

filename = ActiveDocument.FullName
saveLocaton = "z:\temp\"
FolderName = "test"
On Error Resume Next
Kill "z:\temp\test_files\*"  'Delete all files
RmDir "z:\temp\test_files"  'Delete folder

ActiveDocument.SaveAs2 filename:="z:\temp\test.html", FileFormat:=wdFormatHTML

ActiveDocument.Close
Kill saveLocaton & FolderName & ".html"
Kill saveLocaton & FolderName & "_files\*.xml"
Kill saveLocaton & FolderName & "_files\*.html"
Kill saveLocaton & FolderName & "_files\*.thmx"

Name saveLocaton & FolderName & "_files\image00" & 1 & ".png" As saveLocaton & FolderName & "_files\" & test2 & "_00" & x & ".jpg"

Word.Application.Visible = True Word.Application.Activate