[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Search]

Unicode HTML character entities in w3m



Hi all,

here is a problem that has bothered me for some time: Some CMS, most
notably Wordpress, seem to replace certain characters by a somewhat
more expressive Unicode equivalent. For example, instead of "
(double-quotes), the HTML source code will contain “ and ”
which will be rendered as left-double-quote and right-double-quote
respectively. I mostly use w3m which as far as I can tell should be
able to display these characters, at least theoretically. the problem
is that it does not work, which means I get things like ü and ý
for the above two characters. This is very distracting when reading
thext that contains many such characters. 

An example site is:

http://dvorak.org/blog/

What do other w3m users get when reading these sites? 

I run emacs on the linux console. I use the "latin-1" language
environment. I tried switching to utf-8 and enable multipyte support
in emacs but it did not really solve the problem. However, things
improved in so far as the characters  were now correctly indentified by
"describe-char". I use ViaVoice as tts. the fact that "describe-char"
identifies the characters correctly might be a hint that the problem
occurs on the way from the buffer to the tts.

Today, I finally had enough of this problem and hacked together a fix
that just replaces all those unicode characters by their ASCII
equivalent just before the HTML is rendered by w3m. This is not a
clean solution but fixes the problem temporarily.

I noticed that w3 shows the problematic pages fine, i.e. the above two
characters both appear as regular double-quotes. I could not determine
if this replacement is performed by w3 or if this is a special emacspeak
feature. 

If other people experience the same problem, I can post the pseudo-fix
mentioned above.

Best regards, Lukas

-----------------------------------------------------------------------------
To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a
subject of "unsubscribe" or "help"



If you have questions about this archive or had problems using it, please send mail to:

priestdo@xxxxxxxxxxx No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog