[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re:Current status of speech servers, and ideas wanted

Just a few comments.....

* There is considerable complexity in Linux sound architectures at this time -
  especially with distros now installing pulseaudio by default. 

* Many users are able to get the current servers working well. I currently
  have three different systems working with three different sound cards on
  three different linux  distributions

    i386 Debian Testing/unstable: SB Audigy 4 sound card, pulseaudio, alsa, 
    outloud, espeak, emacs23 and latest svn emacspeak.

    i386 Ubuntu Karmic: Intel HDA soundcard, pulseaudio, alsa, outloud,
    espeak, emacs23, svn emacspeak

    ia_64 Ubuntu Karmic: SB Audigy SE (CA0106) soundcard and Intel HDA
    soundcard, pulseaudio, alsa, espeak, emacs24, svn emacspeak. 

On all of these systems, I've created a system specific .asoundrc file. On the
Debian box, I've actually created a specific outloud pcm that points to one
channel, modified the outloud source to use that pcm rather than the default.
I did this because I found witht eh audigy 4 card, I couldn't get a dmix
config that worked well with outloud and didn't cause issues with other apps.
Essentially, this keeps them separate and prevents issues with alsa outloud
and pulseaudio. both outloud nd espeak work well on this system

On the i386 Ubuntu system, I have a .asoundrc file so that I can have outloud
and auditory icons (the HDA soundcard is not multi-channel capable). I do
still get contention with pulseaudio at times and haven't quite got that
working correctly. Espeak has some issues, but I've not looked into them yet.

On the ia_64 ubuntu karmic system, I have espeak running. This system is only
a week old and I'm still getting things configured correctly. I've had some
issues with getting espeak to work well. Initially, it was cutting off the
last parts of the output or only reading small bit of th line. I fixed this by
installing a .asoundrc file with the dmix plugin and setting rate, buffer
size, period time etc. There is some contention with pulse still and I'm
having some problems getting it to work consistently i.e. all sound programs
working well. This is partly due to pulseaudio getting confused over the two
sound cards, partly me getting to understand pulse and partly just getting
everything tweaked correclty. However, I do have emacspeak workinig well with
espeak, including auditory icons. I suspect once I get pulseaudio sorted out,
it will be fine. 

All of this brings me to the following conclusions -

1. The speech servers for emacspeak work fine - at least espeak and outloud. 
2. The configuration is very much hardware dependent. 
3. Despite claims that alsa no longer requires a .asoundrc file, I've never
managed to get outloud to work correctly without one. Even on the system
running espeak, I needed a .asoundrc to get it to work correctly.
4. Espeak will work fine once you get the configuration tweaked correctly.
Latency, sluggish response, etc is very much affected by how you have alsa and
pulseaudio setup and probably depends on type of sound hardware.
5. At this time, I suspect it is very unlikely we can find a 'turn key'
solution for emacspeeak that will work on all systems. It will remain a per
system config tweak problem and I don't believe there is much that can be done
until we have more mature pulseaudio, alsa and distro setups. 

I think an emacspeak to speech dispatcher interface would be an excellent
addition. However, getting that right is probably non-trivial and will require
a fair amount of effort. 

Your post provides no specific detail that would allow anyone to give you more
help. You fail to mention anything about distribution, sound card hardware,
etc. The problems you experience with espeak are almost definitely due to your
local configuration, but its unlikely anyone can give you specific answers or
solutions. Youwill need to experiment. The right solution will likely depend
on what soundcard you have and what distribution you are running. However, the
itneractions between the various layers are quite cmplex, so you need to take
a very methodical and slow approach. this is particularly the case with
pulseaudio. I had to blow my config away and start from scratch a number of
times. Access to a hardware synth can be very useful when doing this. Luckily,
I sitll have my old dectalk express. 

I've seen numerous reports of people having to remove pulseaudio to get things
working. This might be a good starting point. Remove pulseaudio and get as
many things working as possible without it. Then install pulseaudio and get it
working and then move things over to use pulse.


Tyler Spivey writes:
 > Hash: SHA512
 > I've been trying to get Emacspeak going again with an acceptable
 > level of performance for the past few days, without much luck.
 > Here's a summary of my findings on the current state of the various
 > speech servers that I have access to, and a request for ideas for
 > improving the situation:
 > espeak: The server is very unresponsive, and reads "capital" before
 > every uppercase word unless dtk-split-caps is off.
 > Sometimes, while moving quickly through a document, it will randomly
 > say various punctuation characters at the beginning of the line. I
 > think these are from the previous line.
 > software dectalk: I managed to get both 4.64 and 5.0 working, though
 > they did crash a few times in the few minutes I played with them.
 > They seem very responsive though, although software-dtk uses oss and
 > not alsa, and I don't think you can buy it anymore - the purchase
 > page just hangs.
 > outloud: the server can speak, but silencing speech doesn't work. I
 > don't know why.
 > Multispeech: I managed to get this to work after running sed -i
 > - -e's/Russian-spelling/russian-spelling/' multispeech-voices.el.
 > Freephone/mbrola still workks, and espeak does too. I think an
 > interaction between portaudio and my alsa is causing it to drop the
 > last few ms of whatever it says, though.
 > Eflite: This works, including the alsa binary available at
 > http://homepage.hispeed.ch/loehrer/flite_alsa.html
 > The alsa binary has an echo problem - if another sound is playing
 > such as an mp3, the currently playing chunk of audio doesn't silence
 > very fast. This works on multispeech, so I'm not sure why it doesn't
 > work here.
 > What can be done to improve this situation? I could most likely
 > write a quick hack that spawned a new process for everything it
 > wanted to say, and it would work better than some of these servers.
 > If anyone else is interested, I think we should just focus on one
 > synth, and make a good, responsive server. I vote for espeak, since
 > it seems to be the only thing that's still under active development
 > and is easy to install. We could also do a bridge between emacspeak
 > and speech-dispatcher. Thoughts?
 > - --
 > Tyler Spivey - PGP Key ID: 0xae742aaf
 > Version: GnuPG v1.4.9 (MingW32)
 > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 > qWZIInPSl8jo0qZN7/AOU3ANdAH9S5gS/I7VeFvD2bxBVSbYBey3artzUXPLhVKR
 > v/lvo+BdaUmOZ44DkeYTav9qxsti5MKuvmOtL91lPqyb4Egc+saz7SVT+fXYs563
 > VbcsCXmFQ8zwU/xC1LW9ZcuuOXNjW/lBzH+Le1WsJvGmN8Re1tJSSOktSAWzhbO0
 > +eIBDpxueOoVd7bin0yYLzDPyanbeC/sDZvmHRpl4X17J7uKX3+/8WcamYfVNHJy
 > LqTscviWRARWOg4JKbvGZ+QOg5Owz0bONGashDhQWgy0OjheXfO8VQD8+1cfcW3o
 > MwMnyPcuRwBp9F/LBs8Pud4KX8y5t+LLkh6xJR3d+iPGHtxhsGYeikKYNXAvbrMC
 > q6SSwXVOMwmGZRBg+VdT4d2oEaAOyqktsfeNWNpB2h3Mp9P1lUGH1s2cuALuVcG6
 > QHW1LYxSY3pBh/lSzi2RsejNAhkUMWZw+R+KNAQevVrHqJ/0JpYrDvEmrkfvVLWf
 > K1rSYmn+pVPb6uzZvU8RYQ5X96YVmHUG0aP9tSNzgZLPlGpgjjeZZzITTQsG7skD
 > e82Q/Xf0TdwJsy/eH8UR
 > =b0WD
 > -----END PGP SIGNATURE-----
 > -----------------------------------------------------------------------------
 > To unsubscribe from the emacspeak list or change your address on the
 > emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
 > subject of "unsubscribe" or "help".

Tim Cross

There are two types of people in IT - those who do not manage what they 
understand and those who do not understand what they manage.
Tim Cross

There are two types of people in IT - those who do not manage what they 
understand and those who do not understand what they manage.

To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help".

If you have questions about this archive or had problems using it, please send mail to:

priestdo@cs.vassar.edu No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog | Search the archive