I'm trying to write a script that opens a user specified ppt, reads it and finds image file names. Im using the python pptx package since this lets me actually open ppt files. Im trying to go through each slide and check that slide for images but I have no idea how to do this with the pptx package and the documentation isn't really clear on this imo.
So after a bit more digging into the documentation i've found that this kind of does the job:
file = open(fileName, 'rb')
ppt = Presentation(file)
images = []
for slide in ppt.slides:
for shape in slide.shapes:
print(shape.image)
if shape.image:
if isCorrectImageType(shape.image):
print(shape.image.filename)
file.close()
def isCorrectImageType(imageShape):
imgExtension = imageShape.content_type
filePattern = '(.jpg$|.jpeg$|.png$|.gif$)'
image = re.search(filePattern, imgExtension)
print(image.group(0))
return image.group(0)
This works however it doesnt return the correct filename. It returns image.png while the filename is myfile.png