[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Search]

espeak, pulseAudio and ALSA. Success!



I have seen some posts to this list and other lists, such as the
speech-dispatcher list, regarding problems with getting things to work well
with pulseAudio. I have also found lots of other posts, blogs and web pages
where users have had issues with pulseAudio. However, I have found that with
some effort, you can get pulseAudio to work well and you can take advantage of
some of its advanced features. Many of these features are particularly useful
to blind and VI users as they make the use of your sound hardware much more
flexible. 

The following is an outline of my experiences and what I did to get things
working. I'm posting it as I thought it might make a useful addition to the
archives and hopefully, assist others in getting things to work well. It
probably contains some errors and others with more knowledge or experience may
feel I've got some things completely wrong. Corrections and suggestions for
improvement are both welcomed and encouraged. 

Tim

* Background

I've recently been trying to get a new x86_64 based ubuntu 9.10 system working
with emacspeak. This has required a switch from using the IBM ViaVoice Outloud
TTS engine to using the eSpeak TTS engine as I wasn't keen to go through the
hassle of installing a 32 bit version of tcl. 

Getting things working was challenging to say the least. However, the good
news is I now appear to have things working well. In fact, there are only two
issues I'd still like to resolve, very slow character echo and word echo
echoing words twice. Character echo is so slow, I've had to turn it off.
However, I can live with that. Word echo echoing things twice seems to come
and go and I suspect I can track down the issue given some time. I have also
noticed that at times, voice locking seems to get out of synch. I suspect this
is due to mismatched SSML tags. It is quickly resolved by restarting the TTS
or resetting things to factory defaults. 

As there have been a number of posts in the past regarding issues with getting
espeak working and issues with things like pulseaudio etc, I thought I'd
document some of what I've found in case it might be useful to others. 

* Hardware and platform 

The hardware is an Intel i7 CPU with 8 cores and 4Gb of memory. The system has
an nvidia graphics card and two sound cards, an on-board Intel HDA and a
SoundBlaster Audigy SE. 

I'm running Ubuntu Karmic (9.10). The install is pretty standard, except I've
added the ubuntu sound developers PPA to get the latest packaged versions of
pulseaudio. I also downloaded espeak from the sourceforge homepage and built
it from sources. This turned out to be a critically important step.

* Initial Situation 

After a fresh install of Ubuntu Karmic, I logged in and installed emacspeak. I
installed the necessary dev libs to build the tclespeak.so library and then
tried to use it. 

While I managed to get some speech out, it was of poor quality with lots of
distortion and crackling. When the system was under load, latency became an
issue and speech would just stop working at random intervals. Possibly the
worst problem was the truncation of text sent to the TTS. Sometimes this would
represent multiple words, on Other times, it would just be the last few letters. I also
observed a problem with the voice lock settings. Some text just wouldn't get
spoken, at least you wouldn't hear it. Turning off font-lock mode would
resolve this, but that just isn't good enough. I want voice locking!

I then experimented with playing other audio sources, such as mp3, wav and ogg
files. I also experimented with playing them all at once and kicked of a few
different programs to try and put my system under a bit of load. 

The sound was not good. At times it would break up, sometimes it would just
stop and the quality was pretty poor. I was fairly sure it was an issue with
pulseaudio and decided to dig a little deeper. 

* pulseAudio

Things seem to be very clearly divided when it comes to pulseAudio. People
seem to either love it or hate it. In searching for solutions, I found many
pages on the web with titles like "How to fix pulseAduio on Ubuntu, which
would turn out to be instructions on how to remove pulseaudio from your
system. I also found numerous posts on lists such as the speech-dispatcher
list where people just couldn't get pulseAudio to work. I began to seriously
consider removing it from my system. However, before doing that, I wanted to
investigate further. 

In reading about pulseAudio, I soon began to realise why many people were so
keen on it. If the hype is to be believed, it will solve many of the problems
that I've often been frustrated by on Linux. Some of the things which got my
attention included 

- Ability to play multiple sound sources at once and control their individual
  volume. How good it would be to play some music in the background without
  drowning out my speech! 

- Send different sound sources to different speakers, including ones conected
  to other computers on my local network. I could stream speech to my system
  in the lounge room and listen to it sitting in a comfortable chair. I can
  also send my music to my stereo. 

- Ability to play multiple sound sources on the one cheap sound card or route
  sounds to different sound cards on a system with multple cards.

Plus many other potentially useful features for anyone who relies on sound on a
day-to-day basis. I decided to jump in with both boots and see where I ended
up.

* Configuring Pulse

The first thing I did was make sure I restored everything to the vendor
installed configuration. I ensured the pulse default config files in
/etc/pulse were the package versions and removed any personal .asoundrc file I
had in my home directory. I also deleted the .pulse directory in my home
directory. I wanted to start with a vanilla configuration. 

* Hardware Synthesizer 

It was obvious that getting things working was going to take a bit of effort.
I also had the old catch 22 that tends to come up for blind users on Linux all
too often. I want to fix sound so that I get text feedback and a usable
interface, but I need text feedback and a usable interface to do that. 

My solution was to drag out my trusty old hardware dectalk express. I think it
is very useful to keep a good old hardware synth about for exactly this type of
situation. Luckily, I had insisted on a serial port in this new computer.
These days, with the growth in popularity for USB, serial ports are no longer
standard on most new systems. They have gone the way of the Dodo and floppy
drive. You will only get one if you ask for it. Even when you do, the local
sales person is likely to be like mine and consider your mad for asking.Don't let yourself be talked out of it. If you have serial devices you
want a serial port. There are serial to USB converters out there, but results
can vary greatly. A serial port is often very useful for many tasks. For one
thing, its easier to hack with than USB because you have easy access to all
the pins. I connected my Dectalk and soon had emacs and emacspeak running with
the dtk-exp driver. 

* Logging 

The first thing I did was increase the logging level for pulseaudio by editing
the /etc/pulse/daemone.conf file and changing the logging level from notice to
info. This creates a lot of useful data in /var/log/messages. I highly
recommend making copies of all config files prior to editing them. Its very
easy to spiral down into a confused mess when playing with this stuff because
you are dealing with so many different levels of interaction. I had a number
of points where things just seemed to fall apart and I'd confused myself
beyond recovery. At this poit, I would copy the original files back to wipe
out my changes and get back to a known 'vanilla' state. 

I also use this technique to confirm I've fixed the problem by intention and
not by accident. Once I believe I know what the solution is and have
documented it, I copy the original files back into place, reboot the system and
follow my documented changes to verify it really does fix things. This may
take more time, but at least I end up with greater confidence I really do know
how to fix the problem. Usually, this will save time in the future when things get
screwed up due to a distro update. 

I then started going through the log messages, looking for anything that might
indicate an issue. Changing logging from notice to info creates a fair amount
of data. To reduce this and help extract only the most recent, I used emacs
grep mode and searched for pulseaudio log messages witht he same process ID as
the current pulseaudio process. 

You can find the process ID of the current pulseAudio process using the ps
command or by looking for the pulsAudio PID file in
/var/run/<user>/pulseaudio.pid. 

* System Mode and User Mode

Pulse Audio can run in two different modes, system mode and user mode. In
system mode, a single pulseAudio daemon is started at boot time and users
connect to that daemon to play audio. While this can be an easy setup in some
cases, it can raise some issues of conflict, especially on a multi-user
system. There are also added issues of access control, configuration and
module loading that become more complex when pulseAudio runs in system mode. 

The other mode of operation is user mode. Under this approach, a pulseAudio
daemone is started as part of your login process and is owned by you. This
eliminates som eof the access control issues, gives you more control over how
things are configured without needing to use super user privileges etc. User
mode tend sto be the default for most distributions and is probably the best
way to go.

* High Priority and Realtime Scheduling

One of the problems with pulseaudio is that it is still under heavy
development and the documentation is a bit behind. This has been made somewhat
worse because Ubuntu and other distros have been working towards improving
how pulseAudio works,  plus there have been some new features introduced in the latest stable
Linux kernels that introduce alternative ways to change the scheduling
privileges of non-root processes in a more secure manner. The end result is
that much of the documentation you will find is not only out of date, it is
misleading. To make matters worse, the documentation on how this stuff should
now be managed simply doesn't seem to exist. Things are hinted at here and
there, but there is no single comprehensive explination or howto. Much of what
I've done has been based on reading between the lines and scanning change logs
for hints etc. I've made some wild guesses and adopted a very scientific 'suck
and see' approach. Essentially, I made a change, restarted things and looked
to see if it made a difference. If it didn't, I changed things back and tried
the next option. This process was continued until I found things working. I
don't pretend to understand exactly why it worked, only verified that it does
appear to for me. Of course, your milage may vary! 

The first thing I noticed in the pulse logs was that pulseAudio was failing to
obtain high priority and realtime scheduling privileges. This was where the
first of many confusing points were encountered. 

The pulseAudio developers recommend running the system with high priority
processes and realtime scheduling. However, this can raise some concerns for
some system administrators and distribution designers. There are two core
issues with setting up processes to have a higher than normal priority and
realtime scheduling. Firstly, traditionally, you needed to run as root to
alter these settings. The standard way to achieve this use to be to make the
binary owned by root and set its permissions so that it became a setuid
program and would execute with root permissions regardless of who ran it.
While this does resolve the immediate problem, it raises significant security
issues. While most of these are only relevant on multi-user systems, most
Linux distributions have been working very hard to eliminate the use of setuid
programs wherever possible. 

The other problem is a genral problem associated with running any process with
high priority realtime scheduling. The problem relates to runaway processes
and how to kill them. If you have a high priority realtime scheduled process
go into an infinite loop and it starts consuming resources, it can be nearly
impossible to kill it. Basically, the problem is your user processes simply
don't have high enough priority to 'get in front' and kill the process.
Suddenly, you load goes through the roof, memory starts vanishing and your
system becomes unusable. Often, the only fix is to do a hard reset. 

While both of these concerns are very legitimate, they are not as serious on a
single user workstation as they can be on a multi-user system. If you have
your workstation behind a firewalled router, such as is common with many
DSLsetups these days and you are the only user on the system, you don't need
to be overly concerned. 

Another approach to allowing user processes to obtain high priority and
realtime scheduling is to configure the user running the process so that they
can set the nice and realtime priority privileges for processes they run. The
problem with this approach is that it enables the user to obtain these
privileges for any process they run, not just for a specific process. This is
also a potential issue for system administrators on multi-user systems. They
don't want their users to have the ability to modify priorities and scheduling
of resources as this would make it far to easy to cripple the system, either
accidentally or intentionally. 

The more modern approach to this problem is to implement a framework that will
enable users to run specific processes with high priority and realtime
scheduling that have been approved by the system administrator (or more
commonly these days, by the distribution designers) and enable this to be
achieved without the end user needing access to super user privileges. 

The 'standard' pulseAudio documentation still recommends setting the
pulseAudio binary to setuid and using a special group called pulse-rt to
control which users can run pulse with realtime scheduling privileges.
However, according to comments in the pulseaudio README.Debian file, you no
longer need to do this for Ubuntu as it is using the rtKit package.
Unfortunately, there is no concise clear documentation that clearly explains
how this now all works. 

While I've resolved the issue, I'[m not 100% happy with how I achieved this.
Part of the problem here is that after 17 years of running Linux, I'm well and
truely over operating system and kernel tweaking. These days, I find such
things boring and frustrating. I just want my computer to work and view it as
a tool that enables me to achieve my other projects, which I'm far more
interested in. As a consequence, I've not really kept up in developments
relating to things like console-kit, policykit or dbus. From my reading, all
or some of these play a role in assigning special privileges, such as being
able to set processes to use realtime scheduling etc. 

It seems there are two ways that I could achieve this. The first way was
through the use of policykit and a package called rtkit. The second was
thorugh the use of the PAM limits configuration file. The mroe correct and
modern solution is the policykit and rtkit route. I chose the easier PAM
solution. Essentially, I edited the /etc/security/limits file and added the
appropriate rtpriority and nice entries. I then logged out and logged back in
so that the new settings wuld take affect and checked the pulseaduio logs. I
was now successfullly gaining the high priority and realtime scheduling
recommended for pulseaudio.

The major issue with this solution is that my user account now has the ability
to set high priority realtime privileges on any process I create, not just
pulseAudio. However, as I am the only user of the system and as I already have
su privileges, this is not a big issue. 

If I had the time and the interest, the propper solution would be to read up
on dbus and policykit and its tools, like pklocalauthority and learn how to
grant the necessary authorisations to my user account. I do need to read up on
dbus as it is rapidly becoming the default message passing mechanism on Linux
and I really do need to learn about policy kit. However, as things are still
evolving and as the documentation is till somewhat scant, I'll wait a bit.
Besides, I still have to work on becoming more tolerant to the overly verbose
and frequently poorly designed use of XML that seems to be a plague in current
modern setups. I'm obviously an old dinosaur that misses the concise
s-expressions and key-value configuraitons of yesterday! However, I'm
confident that sanity will prevail in the end. Java will die the death it
deserves, XML won't be seen as the answer to every problem and we may even see
some sane standardisation in how system configurations are managed. Until then
... ...
Now that my pulseAudio process had high priority realtime scheduling, I found
sound was more reliable and less impacted by system load. Sound didn't break
up everytime I started downloading large amounts of data over my network link
or started building the latest version of emacs etc. However, I stil wasn't
happy with the quality of the sound and there were still log entries that
needed investigation. There was more work to be done.

* Sample rates and quality

In addition to looking at the log entries from pulseAudio, I also used the
pacmd program to query the system and find additional information regarding
the state of pulse. The pacmd program is extremely useful and unlike other
programs for manipulating pulseAudio, it is text based and runs fine within
emacs. Using this program, you can find out details about the system and set
various options or configure modules, sound sinks and sources. 

I noticed that there was a mismatch between the sample rate pulse was using
and the 'native' sample rate of my sound card. My Audigy card likes a sample
rate of 48000, but pulse was using 44100. I changed the pulse
configuraiton to match my sound card.

I also noticed that the native sample format for my Audigy card is s32_le. I
changed the pulse configuration from its default s16le to also match my sound
card.

I also noticed that pulseaudio has a setting for resample-method. This can be
set to use various different methods with differeing performance and quality.
By default, it is set to be quite low. I experimented with different settings
and found that it did affect both the quality of the output as well as the
load put on your system. After a bit of experimentation, I selected
speex-float-5, which appears to give good quality output without an excessive
load on the system. The correct setting will depend a lot on your hardware
and what type of work and sound activity you have going on.

                                                        
. On restarting pulse, sound
quality did sound better. However, it is worth noting that I only heard the
improved quality when my cards output was connected to my external amplifier
and good quality stereo speakers. No real difference in quality was observed
with the cheap built-in speakers attached to my HP monitor. 

By this point, I found pulseaudio was performming a lot better. I could now
play multiple sound sources and even under quite heavy load, I did not
encounter drop out, distortion or high latency. I was quite
happy with my pulseaudio setup. 

I then proceeded to configure things to enable both my sound cards to work
together with pulse. I won't go through the issues I ran into there, but can
provide some useful pointers if anyone else runs into issues. In the end, I
ended up with a configuration whereby I could control which sound card was
used via the pacmd program and can send some clients to one sound card and
some to another. 

* .asoundrc 

According to the pulseaudio website, you should create a .asoundrc file with
entries for the pulse plugin. This will allow you to route any sound played
via alsa through pulse. They also suggest setting up your .asoundrc so that by
default, all alsa output goes through pulse. I did this with the folowing
.asoundrc file and it appears to work very well. 

pcm.pulse {
    type pulse
}

ctl.pulse {
    type pulse
}

pcm.!default {
    type pulse
}

ctl.!default {
    type pulse 
}

It is worth noting that there are some warnings regarding this setup if you
are not using the udev based auto-configuraton modules to setup pulse. If you
are loading the pulse modules manually or statically in the config file, you
need to ensure they don't try to also bind to the default alsa device as you
will get a loop. Using the udev and hal modules, pulse binds to the soundcard
at a lower level and avoids this problem. By default, Ubuntu uses the udev and
hal configuration, so unless you have modified the default.pa or system.pa
files in the /etc/pulse directory, you can probably use an .asoundrc file such
as the one above. 

* Upgrading PulseAudio 

While I was now happy with my pulseAudio configuration, there were still a
couple of minor issues, such as random changes in speech volum. I therefore
decided to add the ubuntu sound developers PPA to my sources list for APT and
upgrade to the latest version they were working with. While I think this has
made my pulse setup more stable, I'm not sure if it has really made a huge
difference. However, as pulseAudio is under heavy development, it probably
makes sense to be at the leading edge. I do still have my old hardware Dectalk
express connected, so at least I'm not stranded if a pulse upgrade should
break things. However, it is important to recognise that the pulse packages in
the sound developers PPA may not be stable and using them does have risks. If
you must have a very stable setup, I would advise sticking with the standard
ubuntu packages. So far, the dev packages have worked well for me. 

* Espeak 

Despite getting pulse to work well, I still had problems with espeak. Text was
frequently truncated, some text just didn't appear to get spoken at all.
Sometimes, the speech would start at the middle of the sentence and then stop
just before the last word or halfway through it. Often, words were pronounced
badly and difficult to understand. I even had a couple of instances where text
speaking rates varied from very slow to extremely fast. 

At first I thought this was a problem with the tclsepak library. I wasn't able
to reproduce the problems with the stand-alone espeak program that comes as
part of the distribution. However, after a few cut and pastes andplaying
around, putting debug statements in tclspeak.cpp and modifying some regsub
expressions int he tclsh espeak script, I began to realise that the server was
sending text correctly and it had valid SSML markup. The problem had to be
with the espeak library. 

On visiting the espeak homepage, I noticed a new version has recently been
released and decided to grab it and give it a go. I donwloaded the sources and
checked the ReadMe file. There wasn't much to it and building the system seemed
pretty straight-forward. 

On checking the Makefile, I noticed that it had three different audio output
options. The default was to use the portaudio library. I checked what libs the
Ubuntu supplied version was built against and saw it was portaudio, so I
decided to go with the default. 

On building the libs and installing them, I found no noticable improvement. I
still had problems with text being truncated and missing words inthe text.
Looking into things further, I began to think that the correct thing to do was
to build the espeak library with pulse rather than portaudio. I re-built the
library after commenting out the portaudio option and enabling the pulse
option. I then installed the newly built library and fied up emacs and
emacspeak. 

Success! The problems with truncated speech, misisng words and bad
pronounciations are all gone. The sound quality is good and the server appears
to be very stable. I now had a system that is owrking well enough for dat to
day use. 

* tclespeak.so

During my debugging sessions, I modified tclespeak.cpp to add some debug
information and to log additonal information. I found a few things which
didn't match with the espeak docs or in how the libesepak was being used by
the espeak program that comes with the libespeak distribution. I also found
some potential inefficiencies in the text being sent for synthesis, which may
or may not impact on performance. This included things like multiple
whitespace characters, newlines etc. There also appears to be some redundent
SSML tagging going on. I'
m now experimenting with some of this to see if I can improve the situation
further. If I find any of the changes I experiment with improve things, I will
provide patches.

* Conclusions

- Getting pulseaudio working correctly is a non-trivial task. It is unlikely
  that distributions will get this working well 'out of the box' for some time
  as there are so many dependent variables to consider. Things vary
  considerably depending on sound card hardware, system CPU, memory etc.
  Finding a good general default configuration that will work well for
  everybody is going to be very difficult. 

- Many of the promises made by pulseAudio developers are realisable and once
  you get it working, it works well. I think pulseAudio is here to stay and we
  need to make it work. More importantly, the effort is worthwhile as it does
  offer a lot of benefits. 

- With respect to emacspeak and the tclespeak interface, things are complex
  because you have four different layers to consider. However, a careful and
  methodical approach seems to work and provide positive rewards. I suggest
  the following approach 

  1. Get sound working with ALSA 
  2. Get pulseAudio working. Make sure it is using high priority RT scheduling  
  3. Get multiple sound sources working with pulseAudio 
  4. Setup a .asoundrc file so that all ALSA sound goes via pulseAudio. This
     will eliminate the likelyhood  of contention between ALSA and pulseAudio
     in accessing sound hardware. 
  5. Make sure that the libespeak library has been built to use pulseAudio
     audio rather than portAudio for its output. 

- The espeak interface for emacsepak does not provide the same level of
  quality in either speech or responsiveness as ViaVoice Outloud. However, it
  is as good as the dectalk express in my opinion. In fact, getting use to
  using espeak is very similar to getting use to the hardware dectalk after
  having used outloud for years. There does not seem to be any plans to update
  outloud to work with modern libraries or to provide a 64 bit version. Like
  it or not, eventually, ViaVoice Outloud is likely to vanish from the scene.
  At this time, esepak appars to be the most viable alternative. 

- I am quite certain some of my assumptions are either misguided or completely
  wrong. In particular, I'd love to get more information on the correct way to
  grant authorisation to use high priority and realltime scheduling on a
  modern Linux system, especiallly ubuntu. If you have information or
  pointers, please let me know. 

- I find it very strange that given Ubuntu now shipps with pulseAudio as the
  default configuration, why a program like espeak and its libarary libespeak,
  is not built to use native pulse access. I wonder if there is some issue
  with doing this I'm not aware of or is it just simply that the espeak
  package maintainers haven't updated things or do they continue with the
  default so that the package will continue to work on both pulse and
  non-pulse based configurations? Given what I have encountered, it may be
  time to release two versions - a libespeak-pulse and a libespeak-portaudio
  version. 


-- 
Tim Cross
tcross@xxxxxxxxxxx

There are two types of people in IT - those who do not manage what they 
understand and those who do not understand what they manage.

-----------------------------------------------------------------------------
To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a
subject of "unsubscribe" or "help".



If you have questions about this archive or had problems using it, please send mail to:

priestdo@xxxxxxxxxxx No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog