Aural CSS Settings Explained was Re: [emacspeak The Complete Audio Desktop] Emacspeak And Voice Locking Using Aural CSS

Aural CSS Settings Explained:

Here is how the four dimensions average-pitch, pitch-range,
stress and richness work (or are supposed to work)

First a bit about voices:

A speaking voice has a default pitch --- fundamental frequency
and this changes over the course of a sentence due to
inflection. Speakers also have the ability to "project" their
voice, or alternatively pitch it lower --- this is similar to
volume but not quite the same.

The ACSS Dimensions:

average-pitch: Basic voice pitch.
               In practice, speakers with smaller heads have
               higher pitched voices, so on formant TTS engines,
               you need to vary the head-size inversely with  the
               fundamental frequency -- see dectalk-voices.el and
               outloud-voices.el --- these are both formant

Pitch-range: Determines "how excited" the speaker sounds.
If you look at the overall intonation contour, pitch-range
determines how high the peaks get and how deep   the valleys get.

Stress: This is indeed subtle.
Basically pitch-range is the overal intonation contour; stress
controls the individual peaks such as primary and secondary
stress. Just increasing pitch-range ends up with a very sing-song
effect; stress and pitch-range together often do better.

Richness: This is the "project your voice" setting. Its inverse
is "smoothness" which is why overlays like voice-smoothen set
richness to be low. The perceived effect is that the voice is
softer,  with higher values of richness, the voice gets
"brighter". If you look at the  spectogram, the "saw-tooth"
patterns you see are much sharper for higher richness values.

