python pptx PowerPoint slide build — trouble understanding the pattern/build approach

Question

I would like to automate the build of PowerPoint slides from scratch, as the required output format for a lot of smarts being performed in Python.

However I'm having trouble with my comprehension of the python-pptx documentation; it seems to follow a rather different pattern to what I would expect, perhaps based on a lot of work I've done with output to tikz graphics. And most of the documentation is more concerned with handling existing powerpoint documents, extracting text and table data, adding slides, batch editing titles, etc.

Here's a somewhat pseudo-code version of how I'm trying to build the components of a PowerPoint slide, with the corresponding graphic version included below. In a sense it builds components (shapes and groups) independently of how they will be used later, and then adds them to the slide at the end.

def rectshape(dikt):
    shp = initiate_shape(MSO_SHAPE.RECTANGLE)
    # width, height, fill, border, text, font_color, etc:
    for k, v in dikt.items():
        setattr(shp, k, v)

    return(sh)

title_shape_attr = {
    'text': None
    , 'fill': 'dark blue'
    , 'font_color': 'white'
    , 'width': Mm(25)
    # etc
}
fish_shape_attr = {
    # same idea as above
}
text_shape_attr = {
    # same ideas as above
}

def build_group(t1, t2, t3):
    title_shape_attr['text'] = t1
    fish_shape_attr['text'] = t2
    text_shape_attr['text'] = t3

    s1 = rectshape(title_shape_attr)
    s2 = rectshape(fish_shape_attr)
    s3 = rectshape(text_shape_attr)      

    grp = Group()
    grp.add_shape(s1, left=Mm(0), top=Mm(0))
    grp.add_shape(s2, left=Mm(0), top=Mm(25))
    grp.add_shape(s3, left=Mm(0), top=Mm(50))

    return(grp)

g1 = build_group('Title A', 'One Fish', 'Text T')
g2 = build_group('Title B', 'Two Fish', 'Text E')
g3 = build_group('Title C', 'Red Fish', 'Text X')
g4 = build_group('Title D', 'Blue Fish', 'Text T')

prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
slide.shapes.add_shape(g1, left = Mm(50), top = Mm(20))
slide.shapes.add_shape(g2, left = Mm(80), top = Mm(20))
slide.shapes.add_shape(g3, left = Mm(110), top = Mm(20))
slide.shapes.add_shape(g4, left = Mm(140), top = Mm(20))

HOWEVER, I think I'm catching on that this is backward to the way python-pptx builds a slide -- that I need to start with a slide, then add a group and define the location/size at the same time, then add shapes to the group again with locations and size, and only after all that is set, I go back through all the shapes and groups and alter the attributes and text?

For example, trying shp = Shape(MSO_SHAPE.RECTANGLE) in the rectshape function give

TypeError: init() missing 1 required positional argument: 'parent'

-- suggesting that the shape needs to belong to something else before I can create it.

None of my attempts thus far are close enough to include as "what I've tried so far", and would only really confuse the issue.

Can anybody help sketch out the general pattern / logic of how to build a slide out of groups of shapes?

scanny scanny · Accepted Answer · 2018-08-19T06:02:27

One way to understand it might be to think about using PowerPoint yourself, using the UI. You start by opening a presentation, then you add a slide, then you add shapes to the slide. The API follows this model.

You can't add a shape until you have a slide, you can't add a slide until you have a presentation, etc. Once you have a shape you can change its attributes.

So you need to organize your logic like that. The only thing you need to import from pptx is Presentation; okay, also enumerations and length specifier utilities and so on, but no other objects like Slide or Shape or Picture. You never have occasion to construct one of those so you don't need the class. You get a slide from a presentation by calling slide = prs.add_slide(). You get a new shape by calling shape = slide.add_shape(), etc. There's no idea of a "loose" object other than Presentation. All other PowerPoint domain entities are lodged from creation time in the presentation object hierarchy (Presentation --< Slide --< Shape roughly). The creation mechanism is invariably a method on the parent object.

If you need to do logic to work out what goes on which slide, how many items there are, etc. beforehand, perhaps to know where to locate the objects on the slide and how big to make them, you'll need an intermediate representation that you then either traverse to do the actual writing, or you enable your logical objects to write themselves on a call to say .render() or something, passing them a reference to their slide to write themselves on. I've done some complicated fiscal calendar layouts that worked this way and needed to do a lot of their own noodling about what color each block should be, how big they were, what their spacing from each other was, etc., all driven essentially from data pulled from a database.

Regarding GroupShape objects, those are brand new in python-pptx (a month ago maybe) and how to work with them strikes me as a separate question. But it shouldn't be hard to work out from the documentation here: https://python-pptx.readthedocs.io/en/latest/api/shapes.html#pptx.shapes.shapetree.SlideShapes.add_group_shape

Basically you call group_shape = shapes.add_group_shape(). You can then add shapes to the group shape using the same methods you use on slide.shapes (e.g. .add_picture()) or you can pass a sequence of existing shapes as a parameter of .add_group_shape() to "group" those shapes into the returned group shape. Either way you add all of the shapes one by one; it's your choice whether you create then group or create the shapes inside the group.

python pptx PowerPoint slide build — trouble understanding the pattern/build approach

1 Answers