Re: Reading math formulas in PDF files

On Wed, Apr 18, 2007 at 10:15:02AM +0200, Lukas Loehrer wrote:
> At least for some pdf files, google does an excellent job at
> preserving math formulas in their "View as HTML" view. 

Interesting. There are also PDF files that contain only scanned images of
text. To read these, you need OCR software, and it now appears that quality,
free as in freedom, OCR solutions are coming down the pipeline:


and it shouldn't be difficult for the Emacs Lisp enthusiasts on the mailing
list to write a function that will run OCR Opus on a set of image files, or
even scan a page, and then read the output into an Emacs buffer. Ideally this
would be an Emacs mode that lets you set scanning parameters.

The OCR software itself isn't expected to be ready for release until late next
year, but I'm sure members of this list will be helping with the beta testing
along the way. XPDF can extract image files from PDF documents, which could
then be converted to whatever format the OCR software accepts.

