I am currently attempting to pull text from .ppt and .pptx files. I am successfully using python-pptx in order to handle .pptx files, BUT according to its documentation, ".ppt files from PowerPoint 2003 and earlier won’t work."
When creating a presentation item using this line of code:
`prs = Presentation("Filepath\\presentation.ppt")`
I receive the following error:
`Traceback (most recent call last):
...shortened for brevity....
KeyError: "no relationship of type 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument' in collection"`
I believe that this error is occurring because python-pptx cannot handle .ppt files. I have tried to remedy this situation three ways:
- I wanted to use the
.save()
function associated with python-pptx BUT I would have to make a presentation item to do that. I cannot do that because I'd have to make use of python-pptx which cannot handle the .ppt file in the first place. - Make use of
os.rename(src, dst)
- This did not work. Renaming the file does not work the same as 'save as' therefore making the file corrupt.
I used
win32com
to open the PowerPoint Application, open the .ppt file, and then save the file as .pptx, and close both the file and application.- This method worked BUT it is really 'clunky.' (See code below.)
Application = win32com.client.Dispatch("PowerPoint.Application") Application.Visible = True Presentation = Application.Presentations.Open("Filepath\\presentation.ppt") Presentation.Saveas("Filepath\\presentation.pptx") Presentation.Close() Application.Quit()
My question to the community is whether there is a more sophisticated or elegant way in which to solve my dilemma. My dilemma being that I want to be able to parse text from .ppt files and python-pptx does not handle those file types.