1
votes

I'm trying to write a script that opens a user specified ppt, reads it and finds image file names. Im using the python pptx package since this lets me actually open ppt files. Im trying to go through each slide and check that slide for images but I have no idea how to do this with the pptx package and the documentation isn't really clear on this imo.

So after a bit more digging into the documentation i've found that this kind of does the job:

file = open(fileName, 'rb')
ppt = Presentation(file)
images = []

for slide in ppt.slides:
    for shape in slide.shapes:
        print(shape.image)
        if shape.image:
            if isCorrectImageType(shape.image):
                print(shape.image.filename)
file.close()

def isCorrectImageType(imageShape):
    imgExtension = imageShape.content_type
    filePattern = '(.jpg$|.jpeg$|.png$|.gif$)'
    image = re.search(filePattern, imgExtension)
    print(image.group(0))

    return image.group(0)

This works however it doesnt return the correct filename. It returns image.png while the filename is myfile.png

1

1 Answers

1
votes

The image filename is only stored in the XML if the image is inserted from a file. If the image is imported from a binary stream (by a program such as python-pptx) there is no filename available and so the image.{ext} form is used instead. This is also the case when an image is pasted into place using PowerPoint.

So the filename is not necessarily always available.

However, when it has been recorded, it is available in the descr attribute of the picture shape:

from pptx.enum.shapes import MSO_SHAPE_TYPE

for shape in slide.shapes:
    if shape.shape_type != MSO_SHAPE_TYPE.PICTURE:
        continue
    picture = shape
    print(picture._pic.nvPicPr.cNvPr.get('descr'))

This code accesses the XML that looks like this:

<p:pic>
  <p:nvPicPr>
    <p:cNvPr id="6" name="Picture 5" descr="python-logo.gif"/>
    <p:cNvPicPr/>
    <p:nvPr/>
  </p:nvPicPr>
  ...

and should return the value 'python-logo.gif'.