Home > ocr, powerpoint > ocr and powerpoint

ocr and powerpoint

One of the projects I’m working on requires text being read off of a powerpoint slide, or any type of presentation materials for that matter. In a preivous post I released a perl script that can parse the text out of a native .ppt file using some clunky ole automation. But in this new scenario, the PPT is recorded as a JPEG image. I’m probably the last one on the planet to find GOCR, the open source OCR program. There is even a windows binary that you can download. In order to replicate the problem I’m trying to solve, I do the following:

  1. Save the .ppt deck as .jpg – this feature will save all the slides or the current slide as jpg files
  2. Next you need to transform the image to greyscale, because of 2 issues with GOCR, it works best with greyscale images that are in the .pnm format. – for this step you’ll need to download djpeg.exe
  3. Then run the following:
    > djpeg -grey -pnm test.jpg test.pnm
    > gocr test.pnm
    and that’s it!

Have a look at this site : http://www.seeingwithsound.com/ocr.htm

This is a great example where gocr and djpeg are being used.

Advertisements
Categories: ocr, powerpoint Tags: , , ,
  1. lee
    February 26, 2008 at 8:58 am

    hi…need to pull text out of powerpoint and into XML file for flash parsing. Any ideas or points in the right direction, many thanks…

  2. February 26, 2008 at 8:07 pm

    I have an earlier post on using Perl to manipulate the OLE controls for extracting text from a powerpoint file:

    https://techdad.wordpress.com/2007/02/03/extract-text-from-powerpoint-using-perl/

  3. lee
    February 27, 2008 at 5:00 am

    yeah seen that… thanks anyway.
    am working through it just have to learn VB and the powerpoint arch.
    p.s. if you know the proper way to search and replace all unXML char’s like the ‘ in you’re, using the powerpoint vb editor lingo… last post on this i promise :-)

  4. February 9, 2013 at 10:48 pm

    Do you mind if I quote a couple of your posts as long
    as I provide credit and sources back to your site?
    My website is in the very same area of interest as yours and my visitors would
    genuinely benefit from some of the information you provide here.

    Please let me know if this alright with you. Many thanks!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: