[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Using emacspeak with speech-dispatcher
- To: email@example.com
- Subject: Using emacspeak with speech-dispatcher
- From: Tim Cross <firstname.lastname@example.org>
- Date: Sat, 7 Jan 2006 15:52:58 +1100
- Cc: email@example.com
- Delivered-To: firstname.lastname@example.org
- Delivered-To: email@example.com
- In-Reply-To: <firstname.lastname@example.org.HOWL>
- List-Help: <mailto:email@example.com?subject=help>
- List-Post: <mailto:firstname.lastname@example.org>
- List-Subscribe: <mailto:email@example.com?subject=subscribe>
- List-Unsubscribe: <mailto:firstname.lastname@example.org?subject=unsubscribe>
- Old-Return-Path: <email@example.com>
- References: <firstname.lastname@example.org.HOWL>
- Resent-Date: Fri, 6 Jan 2006 23:53:18 -0500 (EST)
- Resent-From: email@example.com
- Resent-Message-ID: <piHBF.A.bKC.-k0vDB@mail>
- Resent-Sender: firstname.lastname@example.org
Just a couple of points which may help.
With respect to ALSA, I've been using ALSA for sometime now with the
software dectalk and the ALSA OSS emulator. Raman has put some
documentation on how to get direct ALSA support using ViaVoice
Outloud. Alsa is certainly the way to go as it is pretty much the
official sound support layer for 2.6.x kernels onwards.
I think support for speech dispatcher is a good idea. It would
certainly increase options for speech servers within
emacspeak. I would suggest that since speech dispatcher supports SSML
(for synthesizers which can understand it), I would suggest doing an
SSML interface between emacspeak and speech dispatcher. This would
also have the advantage that if we had a good generic SSML interface,
other synthesizers which understand SSML but which are not supported
by speech dispatcher could also be easily integrated into emacspeak. I
also believe that speech dispatcher will strip out SSML tages if the
current backend synthesizer does not support them.
If you go to my website at http://www-personal.une.edu.au/~tcross, you
will find a tar ball of a patched emacspeak I did to support the
Cepstral voices, which includes an SSML interface. Unfortunately, this
is for emacspeak 20, but you should be able to update it for emacspeak
23. The cepstral interface is broken and I've not updated it, but the
SSML stuff will give you a good starting point for getting emacspeak
to generate TTS commands which are based on SSML.
Note that the reason I've not maintained the SSML and Cepstral stuff
is because Cepstral has changed its C API and I've not had time to
update it and as the SSML stuff has never been integrated into
emacspeak, I've just not had the time to re-patch every emacspeak
version each time it comes out. Raman was going to have a look at
what I've done, but it doesn't look like he has had time. I've never
had any feedback on what I've done, so it could all be completely
wrong, but it did seem to work. Unfortunately, the SSML support within
Cepstral at the time I was using it wasn't great as they only
supported a subset of the SSML tags and many of the ones we really
wanted to get decent voice locking were not supported.
One of the issues with SSML is that unlike the existing emacspeak TTS,
SSML is XML based and therefore requires properly formed start and end
tags, while the existing TTS interfaces just use single start tags. This
means having to patch the dtk-speak.el file so that TTS commands are
given both a start and end tag. You will also need to create an
ssml-voices.el file (see the outloud-voices.el as an example.
You will also find within the tar ball a generic-voices.el. This is a
'do nothing' voices file which can be used to get quick and dirty
interfaces between emacspeak and any speech synthesizer happening
quickly. Essentially, it just doesn't add voice lock type commands to
the text emacspeak sends to the speech servers. So, instead of
solutions which attempt to create a basic interface by having a script
which strips out dtk or outloud commands, you can create text streams
which just have text and eliminate the need to do any stripping. I was
going to use this to create new double talk and flite interfaces which
provided just basic speech.
Once you have emacspeak generating TTS commands which are SSML based,
all that probably remains to do is create a tcl script which connects
to speech dispatcher and passes the SSML tagged text to speech
dispatcher via a socket, plus add support for commands such as
changing punctuation and some of the useful but not essential bonus
options, like split caps, all caps beep etc. In fact, it wouldn't even
need to be a tcl script - I only mention it as all the other helper
interface scripts are tcl. It could really be any language you
like. Alternatively and possibly better left as a later task, you
could bypass the helper scripts completely and create a direct
interface to speech dispatcher from within elisp - check out the
speech-dispatcher.el file for clues on doing this. However, if you go
the direct interface route, you will have to do a fair amount of
additional work which has already been done in the tcl tts-lib.tcl
file by Raman, which is why I'd probably go the tcl interface helper
With respect to getting emacspeak to support multiple languages, I
think this is a much more difficult task. Raman is the person to
provide the best guidence here, but as I understand it, quite a lot of
emacspeak would need to be changed. The source of the problem here I
think is mainly due to the fact that historically, many hardware
synths, like the dectalk, only supported single byte character sets
and only handled 7 bit ascii reliably. Therefore, Raman did
considerable work to incorporate character mapping and translation
into emacspeak to ensure that only 7 bit characters are ever sent to
the speech synthesizer.
This means that to get reliable support for multi-byte character sets
and even 8 bit character sets, quite a bit of patching would be
required. To make matters more complex, although most new software
synthesizers (and even some hardware ones) will support at least the
full 8 bits and some even multi-byte characrter sets, emacspeak would
need some way of knowing this in order to provide consistent and
reliable processing of characters and mapping of 'special' characters
to meaningful spoken representations. However, currently, emacspeak
doesn't have any facility which dynamically allows it to change its
mapping of characters based on the capabilities of the speech
synthesizer. While speech dispatcher may be able to handle this to
some extent, we need to ensure support for existing synthesizers is
Although I actually know very little about other character sets,
especially multibyte ones, I'd also be a little concerned about how
emacs itself is evolving in this respect. From many of the posts on
the emacs newsgroups, I get the impression that this is still a rather
mirky and inconsistent aspect of emacs. It would certainly be
important to check out emacs 22 and see what has changed there before
making a lot of changes to eamcspeak. Definitely make sure you get
guidence from Raman as in addition to him knowing more about emacspeak
than anyone else, he is also the person who has had probably the most
experience in dealing with issues like this.
While my personal time (like everyone else) is just too scarce at
present, especially due to some large work projects I am taking on, I
certainly would be prepared to try and provide some support in getting
emacspeak support for speech dispatcher. I just don't know how much
time I will have and what my response times will be like - probably
pretty slow! However, don't hesitate to drop me an e-mail if you need
some help and I'll see what I can do, just no promises!
Lukas Loehrer writes:
> Hi all,
> using emacspeak with eflite, I aim for the following improvements:
> 1. Use alsa for speech playback.
> 2. Have languages other than English, especially German in my case.
> The first point is the more important one for me. I looked around and
> believe that speech-dispatcher is the most promising way to attack
> these goals. One advantage of this solution is that flite can be used
> for English and festival for other languages, so one can still benefit
> from the good performance of flite.
> What is the best way to connect emacspeak to speech-dispatcher? Are
> there existing solutions? I am considering writing something similar to
> eflite that implements the emacspeak speech server interface and talks
> via speech-dispatcher. Another way would be to make emacs connect to
> speech-dispatcher directly via SSIP.
> The advantage of the external solution would be that it does not
> require any changes to emacspeak to achieve the first of the above
> goals and that it could also be used with other programs like yasr
> that support the emacspeak speech interface.
> As far as I can tell, multi-language support would require extensions
> to emacspeak in both approaches.
> Does anyone have some thoughts or suggestions? Is speech-dispatcher
> the way to go?
> To unsubscribe from the emacspeak list or change your address on the
> emacspeak list send mail to "email@example.com" with a
> subject of "unsubscribe" or "help"
To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "firstname.lastname@example.org" with a
subject of "unsubscribe" or "help"
Emacspeak Files |
Unsubscribe | Search