[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading math formulas in PDF files



On Wed, Apr 18, 2007 at 10:15:02AM +0200, Lukas Loehrer wrote:
> 
> At least for some pdf files, google does an excellent job at
> preserving math formulas in their "View as HTML" view. 

Interesting. There are also PDF files that contain only scanned images of
text. To read these, you need OCR software, and it now appears that quality,
free as in freedom, OCR solutions are coming down the pipeline:

http://code.google.com/p/ocropus/

and it shouldn't be difficult for the Emacs Lisp enthusiasts on the mailing
list to write a function that will run OCR Opus on a set of image files, or
even scan a page, and then read the output into an Emacs buffer. Ideally this
would be an Emacs mode that lets you set scanning parameters.

The OCR software itself isn't expected to be ready for release until late next
year, but I'm sure members of this list will be helping with the beta testing
along the way. XPDF can extract image files from PDF documents, which could
then be converted to whatever format the OCR software accepts.

-----------------------------------------------------------------------------
To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"



If you have questions about this archive or had problems using it, please send mail to:

priestdo@cs.vassar.edu No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog | Search the archive