5
votes

The company I work at requires a list of all inaccessible images/shapes in a .pptx document (don't have alt-text and aren't decorative). To automate the process, I'm writing a script that extracts all inaccessible images/shapes in a specified .pptx and compiles a list. So far, I've managed to make it print out the name, slide #, and image blob of images with no alt-text.

Unfortunately after extensively searching the docs, I came to find that the python-pptx package does not support functionality for checking whether an image/shape is decorative or not.

I haven't mapped XML elements to objects in the past and was wondering how I could go about making a function that reads the val attribute within the adec:decorative element in this .pptx file (see line 4).

<p:cNvPr id="3" name="Picture 2">
    <a:extLst>
        <a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}"><a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{77922398-FA3E-426B-895D-97239096AD1F}" /></a:ext>
        <a:ext uri="{C183D7F6-B498-43B3-948B-1728B52AA6E4}"><adec:decorative xmlns:adec="http://schemas.microsoft.com/office/drawing/2017/decorative" val="0" /></a:ext>
    </a:extLst>
</p:cNvPr>

Since I've only recently started using this package, I'm not sure how to go about creating custom element classes within python-pptx. If anyone has any other workaround or suggestions please let me know, thank you!

2

2 Answers

1
votes

Creating a custom element class would certainly work, but I would regard it as an extreme method (think bazooka for killing mosquitos) :).

I'd be inclined to think you could accomplish what you want with an XPath query on the closest ancestor you can get to with python-pptx.

Something like this would be in the right direction:

cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative")
if adec_decoratives:
    print("got one, probably need to look more closely at them")

One of the challenges is likely to be getting the adec namespace prefix registered because I don't think it is by default. So you probably need to execute this code before the XPath expression, possibly before loading the first document:

from pptx.oxml.ns import _nsmap

_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"]

Also, if you research XPath a bit, I think you'll actually be able to query on <adec:decorative> elements that have val=0 or whatever specific attribute state satisfies what you're looking for.

But this is the direction I recommend. Maybe you can post your results once you've worked them out in case someone else faces the same problem later.

1
votes

The problem was a lot simpler after all! All thanks too @scanny I was able to fix the issue and target the val=1 attribute in the adec:decorative element. The following function returns True if val=1 for that shape.

def isDecorative(shape):
    cNvPr = shape._element._nvXxPr.cNvPr
    adec_decoratives = cNvPr.xpath(".//adec:decorative[@val='1']")
    if adec_decoratives:
        return True

Here is the complete script for checking accessibility in a single specified .pptx so far (Prints out image name and slide # if image is not decorative and doesn't have alt-text):

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
from pptx.enum.shapes import PP_PLACEHOLDER
from pptx.oxml.ns import _nsmap

_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"

filePath = input("Specify PPT file path > ")
print()

def validShape(shape):
    if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
        return True
    elif shape.shape_type == MSO_SHAPE_TYPE.PLACEHOLDER:
        if shape.placeholder_format.type == PP_PLACEHOLDER.OBJECT:
            return True
        else:
            return False
    else:
        return False

def isDecorative(shape):
    cNvPr = shape._element._nvXxPr.cNvPr
    adec_decoratives = cNvPr.xpath(".//adec:decorative[@val='1']")
    if adec_decoratives:
        return True

# Note: References custom @property added to shared.py and base.py
def hasAltText(shape):
    if shape.alt_text:
        return True

def checkAccessibility(prs):
    for slide in prs.slides:
        for shape in slide.shapes:
            if validShape(shape) and not isDecorative(shape) and not hasAltText(shape):
                yield shape
                slideNumber = prs.slides.index(slide) + 1
                print("Slide #: %d " % slideNumber + "\n");

for picture in checkAccessibility(Presentation(filePath)):
    print(picture.name);