[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dtk-soft

I'll only post once on this list on this topic; there have been several
good replies already.

I wouldn't characterize the process used by the DECtalk and Eloquence as
"generating a series of clicks," but rather a series of samples.  One way
or another, they generate frames of data 10 or 20 milliseconds long, each
frame synthesized by linear predictive coding or another method which uses
vocal tract parameters to synthesize a signal.    A phoneme or a transition
between phonemes occupies several frames; the data used by a predictive
coder includes weighted data from previous frames in order that less
information needs to be transmitted to control the process.  DECtalk,
having an internal formant representation which is accessible through
command parameters, can create a wide range of voice characteristics.
Raman uses this ability to morph the voice to good effect in EmacSpeak.
DECtalk was modeled on Klatt's voice; even the female voices are derived
from it by adjustment of parameters.

Festival, Flight, Cepstral, AT&T Natural Voices, and the Microsoft
text-to-speech engine are all classified as concatenative synthesizers.
Instead of a set of digital filters excited by impulses and noise,
approximating speech, they work with samples of "speech units" recorded
from a real human.  A person whose voice is to be synthesized must spend
many hours in a recording studio, reading a set of sentences, using various
amounts of inflection, level of excitement, etc.  The database for one
voice could contain a few thousand to 50,000 snippets.  A concatenative
synthesizer must use the rules for text normalization, text-to-phoneme
translation, assignment of stress and pitch, and use any other information
provided in the input data stream, and find the best match between
successive speech units to synthesize speech from its database.  At the
boundaries between speech units, the waveforms must be morphed again, so
the pitch matches, and perhaps to keep the formants from jumping frequency
too abruptly.  Some of this processing is similar to what's required to do
high-quality time-scale modification of speech, because you usually don't
have the exact speech unit you need for a given context.  I think that
concatenative synthesizers are the future of speech synthesis, especially
for reading, because a synthesizer can be modeled on whoever's voice you
are willing to pay for, and a synthesizer can more naturally reproduce a
particular regional accent or dialect.

But this leads me to an EmacSpeak question: Does Flight allow you to tweak
the formants of the speech, or only to switch among a fixed set of voices?
I'm sure the pitch, volume and speed can be programmatically modified,
pauses inserted, and earcons played, but do these synthesizers have enough
handles to produce a rich auditory desktop environment?     

At 09:04 AM 2/3/03 +1100, you wrote:
>If viavoice/elaquence create speech simply by using collections of
>clicks, you have to wonder why anybody attempts any other technique -
>such as that used by festival, mbrolla or dectalk. 
>I'd be very interested in any pointers to documentation on speech
>synthesis and tts technology as its an area I am interested in looking
>>>>>> "Doug" == Doug Smith <bdsmith@buncombe.main.nc.us> writes:
> Doug> You were wondering why the voices in these other multi-voice
> Doug> synths sound so much worse than what outloud did. The answer is
> Doug> far from proprietary information. I have read about some of
> Doug> these synthesizers because of a project I am working on and I
> Doug> have found the cause.
> Doug> The voices in those other programs are just c programs that
> Doug> create audio recordings on the fly and attempt to play them
> Doug> back as fast as possible. This is difficult, even with the
> Doug> greatest processors. What viavoice does is just to send a
> Doug> fast-paced sequence of clicking sounds to the card in order to
> Doug> produce speech. It works just like the synths of old which had
> Doug> to have their own dedicated cards. It uses the sound card in
> Doug> the same way as, for example, the echo II for the Apple II
> Doug> series did. It just sends the sounds through to be played,
> Doug> speeding them up and slowing them down and varying the rate
> Doug> just enough to make it sound like intelligible words.
> Doug> The algorithm? No, I don't know exactly, but above is stated
> Doug> the general idea. I have an idea for a new open source synth,
> Doug> but I don't have the time right now to conduct research,
> Doug> experiments and to determine how to get it to work at an
> Doug> optimal level. i hope to begin working on it this summer, and,
> Doug> if possible, have it out soon.
> Doug> I hope this helps you understand the quality issues.
> Doug> -- Doug Smith: C.S.F.C.  Computer Scientist For CHRIST!
> Doug>
> Doug> To unsubscribe from the emacspeak list or change your address
> Doug> on the emacspeak list send mail to
> Doug> "emacspeak-request@cs.vassar.edu" with a subject of
> Doug> "unsubscribe" or "help"
>Tim Cross
>Senior Analyst/Programmer
>Applications Group - Information Technology
>University of New England
> Phone: +61 2 6773 3210
>   Fax: +61 2 6773 3424
>E-Mail: tcross@pobox.une.edu.au
>   Web: http://www.une.edu.au/itd/systems/systems.html
>Who's General Failure and why's he reading my disk?"
>To unsubscribe from the emacspeak list or change your address on the
>emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
>subject of "unsubscribe" or "help"
Braille is the solution to the digital divide.
Lloyd Rasmussen, Senior Staff Engineer
National Library Service f/t Blind and Physically Handicapped
Library of Congress    (202) 707-0535  <lras@loc.gov>
HOME:  <lras@sprynet.com>       <http://lras.home.sprynet.com>
The opinions expressed here are my own and do not necessarily represent
those of NLS.

To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

Emacspeak Files | Subscribe | Unsubscribe | Search