[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Search]

xml, particularly mathml



One of my most important (and now much beloved) families of scientific journals has just
added full texts in xml to pdf as its public formats. A recent example
can be found at
<http://www.geosci-model-dev.net/7/2867/2014/gmd-7-2867-2014.xml>.
This is great but it gets better, they're also using mathml for all the
inline and displayed mathematics. At this point I became slightly
lightheaded :-) So, what's the smoothest way to access such content in emacspeak?
Running (shr-insert-document (libxml-parse-xml-region (point-min)
(point-max))) does a half decent job on the inline mathematics, I
suspect largely by ignoring all the formatting. It's
ignoring other things too, probably because it didn't find the DTD. 
Still, quite usable after 5 minutes' work.
Now the hard bit.
I would like to serialize all the mml constructs and include them in
the resulting parse tree as text. The serialization seems doable, the python module
mathDOM looks like it will do the job. I'd rather not replicate all
the functionality of libxml-parse-xml-region so is there a way I can
intervene in the process to handle the parsing of certain elements
externally? Am I going about this all the wrong way?
You'll have to forgive mesome excitement, after 30 years in research
this is the first time I've gone to a public site and been guaranteed
I can download material with the mathematical content intact.   Now I
just need to extract it.


-- 
Peter Rayner
room 343 
School of Earth Sciences, University of Melbourne, 3010, Vic, Australia
tel: work: +61 (0)3 8344 9708; fax: +61 (0)3 8344 7761 
mobile +61 402 752 379, skype: petermorag 
mail-to: prayner@xxxxxxxxxxx
google scholar profile <http://scholar.google.com.au/citations?user=H3up71wAAAAJ&hl=en>



|All Past Years |Current Year|


If you have questions about this archive or had problems using it, please contact us.

Contact Info Page