4
votes

I have a .PPT (PowerPoint, transferrable to ODP or PPTX) file with speaker notes on every slide. I want to extract the entire presentation into something dynamic so I can create a speaker cheat sheet for running on a phone or table while I talk (thumbnail of the slide with speaker notes). I do this just often enough to HATE doing it by hand.

This is almost easy enough with <cfpresentation format="html" showNotes="yes"> which splits the PPT up into HTML pages and creates an image for every slide. cfpresentation, however, does not transfer the speaker notes, they are lost in translation.

I have also tried <cfdocument> which has no options for preserving slide notes once it converts to PDF.

Is there a way to get the notes out of the PowerPoint file from within ColdFusion?

4
have you looked at the HSLF poi project? poi.apache.org/slideshow/index.html (I use the hssf project extensively but haven't looked at the ppt one yet other than to know it exists)Antony
Is it just me or do the 2 answers given contradict each other? I suppose with the Java solution you have the most specific solution though.Mark A Kruger
Well, I'll always endorse multiple approaches to a problem and I liked seeing the java approach. But I also won't miss the opportunity to highlight some of the less-well-known attributes of native CF functions.Sharondio

4 Answers

4
votes

The simplest solution:

Convert the PowerPoint presentation to OpenOffice ODP format. That's a ZIP file. CFML can unzip it and inside there's a content.xml file which contains the slides and the notes, so CFML can extract the notes from that format.

Given the CFDOCUMENT functionality, perhaps ColdFusion can even convert the PPT to ODP for you?

3
votes

There's no way to do this directly in CF. You can do this by dropping to the underlying Java. I stand corrected. Using the showNotes attribute on the <cfpresentation> tag, should add the notes to the HTML.

As an alternative, or if that doesn't work for some reason, you should be able to use Apache POI to do this, although you may need to use a more recent version of poi than shipped with your version of coldfusion, which may require some additional work.

public static LinkedList<String> getNotes(String filePath) {
   LinkedList<String> results = new LinkedList<String>();

   // read the powerpoint
   FileInputStream fis = new FileInputStream(filePath);
   SlideShow slideShow = new SlideShow(is);
   fis.close();

   // get the slides
   Slide[] slides = ppt.getSlides();

   // loop over the slides
   for (Slide slide : slides) {

      // get the notes for this slide.
      Notes notes = slide.getNotesSheet();

      // get the "text runs" that are part of this slide.
      TextRun[] textRuns = notes.getTextRuns();

      // build a string with the text from all the runs in the slide.
      StringBuilder sb = new StringBuilder();
      for (TextRun textRun : textRuns) {
         sb.append(textRun.getRawText());
      }

      // add the resulting string to the results.
      results.add(sb.toString());
   }

   return results;
}

Carrying over complex formatting may be a challenge (bulleted lists, bold, italics, links, colors, etc.), as you'll have to dig much deeper into TextRuns, and the related API's and figure how to generate HTML.

2
votes

CFPRESENTATION (at least as of version 9) does have a showNotes attribute, but you'd still have to parse the output. Depending on the markup of the output, jQuery would make short work of grabbing what you want.

0
votes

Felt bad that my above answer didn't work out so I dug a little bit. It's a little dated, but it works. PPTUtils, which is based on the apache library that @Antony suggested. I updated this one function to do what you want. You may have to tweak it a bit to do exactly what you want, but I like the fact that this utility returns the data to you in data format rather than in HTML which you'd have to parse.

And just in case, here is the POI API reference I used to find the "getNotes()" function.

 <cffunction name="extractText" access="public" returntype="array" output="true" hint="i extract text from a PPT by means of an array of structs containing an array element for each slide in the PowerPoint">
      <cfargument name="pathToPPT" required="true" hint="the full path to the powerpoint to convert" />
      <cfset var hslf = instance.loader.create("org.apache.poi.hslf.HSLFSlideShow").init(arguments.pathToPPT) />
      <cfset var slideshow = instance.loader.create("org.apache.poi.hslf.usermodel.SlideShow").init(hslf) />
      <cfset var slides = slideshow.getSlides() />
      <cfset var notes = slideshow.getNotes() />
      <cfset var retArr = arrayNew(1) />
      <cfset var slide = structNew() />
      <cfset var i = "" />
      <cfset var j = "" />
      <cfset var k = "" />
      <cfset var thisSlide = "" />
      <cfset var thisSlideText = "" />
      <cfset var thisSlideRichText = "" />
      <cfset var rawText = "" />
      <cfset var slideText = "" />

      <cfloop from="1" to="#arrayLen(slides)#" index="i">
           <cfset slide.slideText = structNew() />
           <cfif arrayLen(notes)>
                <cfset slide.notes = notes[i].getTextRuns()[1].getRawText() />
           <cfelse>
                <cfset slide.notes = "" />
           </cfif>
           <cfset thisSlide = slides[i] />
           <cfset slide.slideTitle = thisSlide.getTitle() />    
           <cfset thisSlideText = thisSlide.getTextRuns() />
           <cfset slideText = "" />

           <cfloop from="1" to="#arrayLen(thisSlideText)#" index="j">
                <cfset thisSlideRichText = thisSlideText[j].getRichTextRuns() />
                <cfloop from="1" to="#arrayLen(thisSlideRichText)#" index="k">
                     <cfset rawText = thisSlideRichText[k].getText() />     
                     <cfset slideText = slideText & rawText />  
                </cfloop>
           </cfloop>

           <cfset slide.slideText = duplicate(slideText) />
           <cfset arrayAppend(retArr, duplicate(slide)) />

      </cfloop>

      <cfreturn retArr />
 </cffunction>