[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: This is off-topic? perhaps.

Try the following:

pdftotext -layout file.pdf

If the PDF document is formatted in multiple columns, specify the -raw option:
pdftotext -raw file.pdf
which usually works.

Thanks to T.V. Raman, the PDF format was improved a number of years ago to
allow the entire logical structure of a document to be represented
independently of its presentation. Unfortunately, XPDF doesn't support this
feature )when it does, it should be easy to write a PDF to XML/XHTML
conversion tool). I don't know whether there are standard conventions for
representing the structure of mathematical expressions in PDF, but a solution
based on MathML should be possible. Here, the problem is that software which
generates PDF files would need to be adapted to include the necessary
structures in the output document.

If you happen to know anyone who is looking for an interesting
accessibility-related computer science project, then collaborating with the
author of XPDF to add support for "tagged PDF", as specified in the latest
edition of the PDF Reference, would be a good suggestion. Background in C++
would be required, and I expect that substantial expertise in computer science
would also be a prerequisite.

To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

If you have questions about this archive or had problems using it, please send mail to:

priestdo@cs.vassar.edu No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog | Search the archive