[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading math formulas in PDF files

At least for some pdf files, google does an excellent job at
preserving math formulas in their "View as HTML" view. Basically, what
it appears to do is replace the math symbols by their corresponding unicode
character, so ideally most of the information is preserved. Emacspeak
currently has some trouble reading such characters, but emacs 22 has
promising features to remedy this problem some time in the (hopefully
near) future. In the meantime, I use the following command to read the
name of the character at point:

(require 'descr-text)
(defun unicode-name-at (pos)
  (interactive "d")
  (let* ((char (char-after pos))
		 (unicode (or (get-char-property pos 'untranslated-utf-8)
					  (encode-char char 'ucs))))
	(message "%s" (downcase (or
				  (assoc "Name"
				   (describe-char-unicode-data unicode)))
				 "Unknown character")))))

This is emacs 22 only and make sure you look at the documentation of
describe-char-unicodedata-file. Naturally, this can only work in multibyte

Of course, the above only helps you with pdf files that were indexed by
google. It would be interesting to know how exactly a pdf must be made up for
this conversion to work and what kind of pdf to HTML converter they use.

Best regards, Lukas

Kalyan Mukherjea writes ("Re: This is off-topic? perhaps."):
> The only formula in Mannin.txt (the text file produced by pdftotxt)
> caught my attention when it was read out:
> I heard:
> 	32 + 42 = 52!!!
> Naturally I "woke up" paid attention and realized that this was the
> rendition of the Pythagorean identity:
> $3^2+ 4^2= 5^2$. 

To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

If you have questions about this archive or had problems using it, please send mail to:

priestdo@cs.vassar.edu No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog | Search the archive