Archive for the ‘perl’ Category

Extract Text from PowerPoint using Perl – convert ppt to text

February 3, 2007 6 comments

I need to unleash this on the world. I’ve been working on a project where the goal is to integrate powerpoint slides into a flash presentation. All the ppt to swf products seem to convert the powerpoint slide into an image, maybe vector based, I’m not sure, but the end result is a swf and the text is not selectable.

The point is to automate the process of grabbing the text out of the original powerpoint deck to use for different purposes, search engine optimization, text search. I’m sure there’s an easy way to do this in VB or C#, but I’m a Perl native and became obsessed with Win32::OLE for the last 3 hours.

Here’s the breakdown:
The Problem: How do I get text out of the text boxes in a powerpoint presentation and into a text file?

1. First get yourself Win32::OLE for perl

2. Take a look at this roth consulting presentation. It took me a few reads to “get it”, and I think there are some inconsistencies in some of the examples.

3. Make yourself a powerpoint presentation. Just use the default text boxes. This script is highly experimental and I’ve only been playing with it on simple slides.

4. After several attempts at pasting code into this blog… i’ve given up. You can download the sample file: ppt-parse.txt PPT Parser

Basically, this is an exercise in Win32::OLE.

Categories: perl, powerpoint