[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using emacspeak with speech-dispatcher

Pierre Lorenzon writes:
 > Hi Tim,
 > > Hi Lukas,
 > > 
 > > Just a couple of points which may help.
 > > 
 > > I think support for speech dispatcher is a good idea. It would
 > > certainly increase options for speech servers within
 > > emacspeak. I would suggest that since speech dispatcher supports SSML
 > > (for synthesizers which can understand it), I would suggest doing an
 > > SSML interface between emacspeak and speech dispatcher. This would
 >   As I said in my previous mail speechd-el is already
 >   a ssml client for speech-dispatcher. It provides
 >   a lisp library to connect to speech-dispatcher
 >   from emacs. So there is nothing important to do
 >   except a certain number of lisp code.
 >   My idea is not to use low level modules of emacspeak,
 >   i.e. all modules bellow emacspeak-speak (dtk-speak,
 >   dtk-interp etc.) but only high level modules i.e.
 >   modules specific to certain applications (emacspeak-w3m
 >   for instance.) To do that since I think I am not
 >   a bad lisp programmer, I simply will rewrite an
 >   emacspeak-speak using the speechd-el library (and
 >   eventually enhancements of it). A startup interface
 >   which should require and load the appropriates
 >   modules it easy to write as well.

The problem I have with this is that I cannot see how this can be done
in a clean way and not break the other existing speech servers. While
I can understand the perspective your comming from, I see it
differently from just adding additional speech server support. Your
proposal represents a more fundamental architectual change to
emacspeak. While I beleive there is some merit to such a proposal, I
feel it needs far more analysis and thought. 

To a large extent, emacspeak, in my view, is a powerful package which
has evolved over time. There could be a strong argument that we could
benefit from a refactoring and re-implementation of the system if we
had the time. There are things Raman did, which at the time were
necessary due to limitations within emacs and available speech
synthesizes and because emacspeak was very much on the leading edge,
Raman was forced to make decisions in areas which were not
standardised (and often set the standard), such as ACSS and XSLT
transformations of web content etc). some things were implemented in
emacspeak prior to them being implemented in emacs - for example,
initially, emacspeak did all the voice locking stuff independently of
emacs font locking because font lock only worked under X11. Later,
Raman changed things to do voice locking based on font lock properties
once emacs implemented font lock under the console etc. There remains
some unused legacy code in some modules which could now be removed

I feel that if there was going to be major changes to how the voice
interface worked, it really needs to be carefully considered and
proper analysis done. While speech-dispatcher does seem to be a good
model, it may not necessarily be a good model for emacspeak or maybe
emacspeak could have an even better model that benefits from the
experience of both. 

However, I see all of this as a bit of a different question to adding
speech-dispatcher support to emacspeak as another alternative to the
existing speech servers. I think we could add basic speech support via
speech-dispatcher very simply and very quickly by using the
generic-voices.el file I've already got and a simple tcl script (or
perl or python or whatever) which just communicates with the
speech-dispatcher daemon at a very basic level. While it won't give
the full power of speech-dispatcher and won't have even the full power
of outloud, dectalk etc, it would provide functional and usable speech
feedback that adds to the options available for users.

 > > also have the advantage that if we had a good generic SSML interface,
 > > other synthesizers which understand SSML but which are not supported
 > > by speech dispatcher could also be easily integrated into emacspeak. I
 >   Sure but speech-dispatcher developpement team will probably
 >   extend the list of supported speech synthesizers.

While thats quite true, my experience has been its easier to add
support for a speech server in emacspeak than adding a new server to
speech dispatcher. I have looked at both and I do use both and while
adding support at a very low level generic layer is easy within speech
dispatcher, its just as easy within emacspeak (I actually use
cepstral's swift TTS with speech dispatcher using a generic interface
similar to the dtk-generic interface which currently comes with speech
distpatcher for the software dectalk. 

 > > also believe that speech dispatcher will strip out SSML tages if the
 > > current backend synthesizer does not support them. 
 > > 
 > > If you go to my website at http://www-personal.une.edu.au/~, you
 > > will find a tar ball of a patched emacspeak I did to support the
 > > Cepstral voices, which includes an SSML interface. Unfortunately, this
 > > is for emacspeak 20, but you should be able to update it for emacspeak
 > > 23. The cepstral interface is broken and I've not updated it, but the
 > > SSML stuff will give you a good starting point for getting emacspeak
 > > to generate TTS commands which are based on SSML. 
 >   Sure Tim but speechd-el does that and is fully
 >   functionnal now hence there is no need to rewrite
 >   an speech-dispatcher client since there is already one.
 >   Moreover since it is developped by the same people
 >   who develop the server we might believe that it
 >   will always be maintained so that it can well connect !
 >   I think that they are enough thinkgs to be developed
 >   and maintained and we do not need to do the work when it is
 >   done by someone else ! Dont you think so ?

In principal I agree. However, as pointed out by Milan, there are some
inconsistencies in the interface models between emacspeak and
speech-dispatcher that make this approach problematic. Note that if we
were merely talking about replacing ALL speech server support with
speech-dispatcher, then I would agree more with your
approach. However, as outlined above, I see this as a different
question and a much bigger problem. Milan's post highlights a few of
the difficulties they already encountered in this area. 

I would possibly suggest a different route if we wanted emacspeak to
be solely based on speech-dispatcher - instead of taking emacspeak and
trying to integrate the speech dispatcher client, I would be more
inclined to start with speechd.el and start to add 'add-ons' to
increase the power and interface of speechd.el so that we increased
its power. I actually started looking at this a few months ago and
created modules which provided enhanced support for speech feedback
from speechd.el when running VM. 

To me, this has the advantage of creating something from a clean code
base and giving us the ability to learn from the experiences of Raman
and the development of emacspeak. The one thing we would need to be
careful about is not to affect the clean base of speechd.el - any
additions should be made as libraries which could be loaded and used
if and when the user wnated them.

 > > 
 > > Note that the reason I've not maintained the SSML and Cepstral stuff
 > > is because Cepstral has changed its C API and I've not had time to
 > > update it and as the SSML stuff has never been integrated into
 > > emacspeak, I've just not had the time to re-patch every emacspeak
 > > version each time it comes out. Raman was going to have a look at
 > > what I've done, but it doesn't look like he has had time. I've never
 > > had any feedback on what I've done, so it could all be completely
 > > wrong, but it did seem to work. Unfortunately, the SSML support within
 > > Cepstral at the time I was using it wasn't great as they only
 > > supported a subset of the SSML tags and many of the ones we really
 > > wanted to get decent voice locking were not supported.
 > > 
 > > One of the issues with SSML is that unlike the existing emacspeak TTS,
 > > SSML is XML based and therefore requires properly formed start and end
 > > tags, while the existing TTS interfaces just use single start tags. This
 > > means having to patch the dtk-speak.el file so that TTS commands are
 >   As i said above, simply not use dtk-speak.

but then we lose existing support for already available synthesizes
within eamcspeak!

 > > given both a start and end tag. You will also need to create an
 > > ssml-voices.el file (see the outloud-voices.el as an example. 
 >   Sure there is something to do in the voice change direction
 >   since speechd-el does not implement many features
 >   for that. However everything is available since you
 >   can control pitch for instance through this interface.
 >   Here are the little speechd-el enhancements of which I
 >   talked above.
 > > 
 > > You will also find within the tar ball a generic-voices.el. This is a
 > > 'do nothing' voices file which can be used to get quick and dirty
 > > interfaces between emacspeak and any speech synthesizer happening
 > > quickly. Essentially, it just doesn't add voice lock type commands to
 > > the text emacspeak sends to the speech servers. So, instead of
 > > solutions which attempt to create a basic interface by having a script
 > > which strips out dtk or outloud commands, you can create text streams
 > > which just have text and eliminate the need to do any stripping. I was
 > > going to use this to create new double talk and flite interfaces which
 > > provided just basic speech.
 > > 
 > > Once you have emacspeak generating TTS commands which are SSML based,
 > > all that probably remains to do is create a tcl script which connects
 >   No ! no need to go through a tcl script ! As I said
 >   in my previous mail there are already enough layers not
 >   add one more ! If we use, enhance, improve ...
 >   speechd-el we already have a speech-dispatcher client.

Again, I see this as us coming from two different directions. Your
approach seems to invove a fundamental change to the architecture
while mine is about adding additional options in supported speech
servers. These are two very different things and I beleive a change in
fundamental architecture requires significant planning, analysis and
consultation. At the very least, you would either need to get Raman's
support as the maintainer of emacspeak or break-off and create your
own branch which is independent of the main emacspeak development. I
don't believe a split in the emacspeak community would benefit anyone
in the long run, but thats just my opinion. 

 > > to speech dispatcher and passes the SSML tagged text to speech
 > > dispatcher via a socket, plus add support for commands such as
 > > changing punctuation and some of the useful but not essential bonus
 >   Already done by speechd-el. The interface I actually
 >   have, (sorry but without emacspeak)
 >   allows : rate control, punctuation control, language control
 >   with emacs commands. Only certain high level function
 >   of emacspeak are not provided.
 >   I would say once more : no need work for low level
 >   function : the work is already done ! we can
 >   work for high level features.
 > > options, like split caps, all caps beep etc. In fact, it wouldn't even
 >   These features are implemented as well by speechd-el.
 > > need to be a tcl script - I only mention it as all the other helper
 > > interface scripts are tcl. It could really be any language you
 > > like. Alternatively and possibly better left as a later task, you
 > > could bypass the helper scripts completely and create a direct
 > > interface to speech dispatcher from within elisp - check out the
 > > speech-dispatcher.el file for clues on doing this. However, if you go
 > > the direct interface route, you will have to do a fair amount of
 > > additional work which has already been done in the tcl tts-lib.tcl
 > > file by Raman, which is why I'd probably go the tcl interface helper
 > > route initially. 
 >   Hum ! On which side the work is the most
 >   advanced ! I must confess that I better know
 >   the speechd-el part but :
 >   1. I must confess again I dont like tcl !

To me TCL is just a language. Some things, like list handling, I think
it does better than other scripting languages and other things it
doesn't do as well. However, they are all functionally
equivelent, so I don't see it matters too much. However, I should not
that I looked at replacing TCL with perl for the software dectalk, but
I ran into a lot of problems trying to create a perl module with would
interface witht he low level dectalk C API with perl. It was
straightforward to do the same thing with TCL and that is one of its
strengths IMO. 

 >   2. I dont see any interst to add a layer.

In the long-term, I would agree. Thats why I said in my original post
that eventually we could do the whole speech-dispatcher interface in
elisp and 'borrow' from speechd.el. However, sticking with the TCL
solution means very little work would need to be done - really, all
you would need is a very simple tcl script which uses the tts-lib.tcl
library and the generic-voices.el or ssml-voices.el file and simply
sends speech to the speech-dispatcher socket. As the generic-voices.el
and ssml-voices.el files are already done, there would be no need to
modify any elisp and a basic tcl command loop which passed data to the
speech dispatcher socket (with whatever speech-dispatcher specific
commands are necessary) would be fairly trivial to implement. 

 >   3. Raman probably did not thought about the
 >   multilingual aspect in his interface since
 >   it is done in speechd-el.

No, I don't believe it is that straight forward. don't forget that the
basic architecture for emacspeak was developed over 15 years ago -
back then, there were very few speech synthesizes capable of
supporting multiple languages and the whole issue of multilingual
support in software was still at a very early stage.

 > > 
 > > With respect to getting emacspeak to support multiple languages, I
 > > think this is a much more difficult task. Raman is the person to
 > > provide the best guidence here, but as I understand it, quite a lot of
 > > emacspeak would need to be changed. The source of the problem here I
 > > think is mainly due to the fact that historically, many hardware
 > > synths, like the dectalk, only supported single byte character sets
 > > and only handled 7 bit ascii reliably. Therefore, Raman did
 > > considerable work to incorporate character mapping and translation
 > > into emacspeak to ensure that only 7 bit characters are ever sent to
 > > the speech synthesizer.
 > > 
 > > This means that to get reliable support for multi-byte character sets
 > > and even 8 bit character sets, quite a bit of patching would be
 > > required. To make matters more complex, although most new software
 > > synthesizers (and even some hardware ones) will support at least the
 > > full 8 bits and some even multi-byte characrter sets, emacspeak would
 > > need some way of knowing this in order to provide consistent and
 > > reliable processing of characters and mapping of 'special' characters
 > > to meaningful spoken representations. However, currently, emacspeak
 > > doesn't have any facility which dynamically allows it to change its
 > > mapping of characters based on the capabilities of the speech
 > > synthesizer. While speech dispatcher may be able to handle this to
 > > some extent, we need to ensure support for existing synthesizers is
 > > not broken. 
 > > 
 > > Although I actually know very little about other character sets,
 > > especially multibyte ones, I'd also be a little concerned about how
 > > emacs itself is evolving in this respect. From many of the posts on
 > > the emacs newsgroups, I get the impression that this is still a rather
 > > mirky and inconsistent aspect of emacs. It would certainly be
 > > important to check out emacs 22 and see what has changed there before
 > > making a lot of changes to eamcspeak. Definitely make sure you get
 > > guidence from Raman as in addition to him knowing more about emacspeak
 > > than anyone else, he is also the person who has had probably the most
 > > experience in dealing with issues like this. 
 > > 
 > > While my personal time (like everyone else) is just too scarce at
 > > present, especially due to some large work projects I am taking on, I
 > > certainly would be prepared to try and provide some support in getting
 > > emacspeak support for speech dispatcher. I just don't know how much
 > > time I will have and what my response times will be like - probably
 >   Time is everyone's problem Tim , but if we might
 >   collaborate we might be more efficient !

I agree and would suggest discussions like this is the necessary first

 > > pretty slow! However, don't hesitate to drop me an e-mail if you need
 > > some help and I'll see what I can do, just no promises!
 > > 
 > > good luck
 >   Bests
 >   Pierre

The more I think about it, given your objectives, I feel the better
approach would be to enhance speechd.el to increase its power by using
emacspeak as a guide on how to add features etc. To me, this has the
advantage of

	- Starting with a cleaner code base

	- Avoid potentially difficult to resolve inconsistencies between
	  the two models that would be encountered when trying to
	  integrate the two

	- Less development time before we see something useful since we
      would be enhancing an existing system

The main and obvious disadvantage is that we would be splitting the
user community, which is a real concern. However, there is nothing
preventing a re-integration later if that proved to be warranted.


To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

Emacspeak Files | Subscribe | Unsubscribe | Search