Unicode HTML character entities in w3m

Hi all,

here is a problem that has bothered me for some time: Some CMS, most
notably Wordpress, seem to replace certain characters by a somewhat
more expressive Unicode equivalent. For example, instead of "
(double-quotes), the HTML source code will contain “ and ”
which will be rendered as left-double-quote and right-double-quote
respectively. I mostly use w3m which as far as I can tell should be
able to display these characters, at least theoretically. the problem
is that it does not work, which means I get things like ü and ý
for the above two characters. This is very distracting when reading
thext that contains many such characters. 

An example site is:


What do other w3m users get when reading these sites? 

I run emacs on the linux console. I use the "latin-1" language
environment. I tried switching to utf-8 and enable multipyte support
in emacs but it did not really solve the problem. However, things
improved in so far as the characters  were now correctly indentified by
"describe-char". I use ViaVoice as tts. the fact that "describe-char"
identifies the characters correctly might be a hint that the problem
occurs on the way from the buffer to the tts.

Today, I finally had enough of this problem and hacked together a fix
that just replaces all those unicode characters by their ASCII
equivalent just before the HTML is rendered by w3m. This is not a
clean solution but fixes the problem temporarily.

I noticed that w3 shows the problematic pages fine, i.e. the above two
characters both appear as regular double-quotes. I could not determine
if this replacement is performed by w3 or if this is a special emacspeak

If other people experience the same problem, I can post the pseudo-fix
mentioned above.

Best regards, Lukas

