[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using emacspeak with speech-dispatcher

Hi Tim,

From: Tim Cross <tcross@rapttech.com.au>
Subject: Using emacspeak with speech-dispatcher
Date: Sat, 7 Jan 2006 15:52:58 +1100

> Hi Lukas,
> Just a couple of points which may help.
> I think support for speech dispatcher is a good idea. It would
> certainly increase options for speech servers within
> emacspeak. I would suggest that since speech dispatcher supports SSML
> (for synthesizers which can understand it), I would suggest doing an
> SSML interface between emacspeak and speech dispatcher. This would
  As I said in my previous mail speechd-el is already
  a ssml client for speech-dispatcher. It provides
  a lisp library to connect to speech-dispatcher
  from emacs. So there is nothing important to do
  except a certain number of lisp code.
  My idea is not to use low level modules of emacspeak,
  i.e. all modules bellow emacspeak-speak (dtk-speak,
  dtk-interp etc.) but only high level modules i.e.
  modules specific to certain applications (emacspeak-w3m
  for instance.) To do that since I think I am not
  a bad lisp programmer, I simply will rewrite an
  emacspeak-speak using the speechd-el library (and
  eventually enhancements of it). A startup interface
  which should require and load the appropriates
  modules it easy to write as well.
> also have the advantage that if we had a good generic SSML interface,
> other synthesizers which understand SSML but which are not supported
> by speech dispatcher could also be easily integrated into emacspeak. I
  Sure but speech-dispatcher developpement team will probably
  extend the list of supported speech synthesizers.
> also believe that speech dispatcher will strip out SSML tages if the
> current backend synthesizer does not support them. 
> If you go to my website at http://www-personal.une.edu.au/~, you
> will find a tar ball of a patched emacspeak I did to support the
> Cepstral voices, which includes an SSML interface. Unfortunately, this
> is for emacspeak 20, but you should be able to update it for emacspeak
> 23. The cepstral interface is broken and I've not updated it, but the
> SSML stuff will give you a good starting point for getting emacspeak
> to generate TTS commands which are based on SSML. 
  Sure Tim but speechd-el does that and is fully
  functionnal now hence there is no need to rewrite
  an speech-dispatcher client since there is already one.
  Moreover since it is developped by the same people
  who develop the server we might believe that it
  will always be maintained so that it can well connect !
  I think that they are enough thinkgs to be developed
  and maintained and we do not need to do the work when it is
  done by someone else ! Dont you think so ?
> Note that the reason I've not maintained the SSML and Cepstral stuff
> is because Cepstral has changed its C API and I've not had time to
> update it and as the SSML stuff has never been integrated into
> emacspeak, I've just not had the time to re-patch every emacspeak
> version each time it comes out. Raman was going to have a look at
> what I've done, but it doesn't look like he has had time. I've never
> had any feedback on what I've done, so it could all be completely
> wrong, but it did seem to work. Unfortunately, the SSML support within
> Cepstral at the time I was using it wasn't great as they only
> supported a subset of the SSML tags and many of the ones we really
> wanted to get decent voice locking were not supported.
> One of the issues with SSML is that unlike the existing emacspeak TTS,
> SSML is XML based and therefore requires properly formed start and end
> tags, while the existing TTS interfaces just use single start tags. This
> means having to patch the dtk-speak.el file so that TTS commands are
  As i said above, simply not use dtk-speak.
> given both a start and end tag. You will also need to create an
> ssml-voices.el file (see the outloud-voices.el as an example. 
  Sure there is something to do in the voice change direction
  since speechd-el does not implement many features
  for that. However everything is available since you
  can control pitch for instance through this interface.
  Here are the little speechd-el enhancements of which I
  talked above.
> You will also find within the tar ball a generic-voices.el. This is a
> 'do nothing' voices file which can be used to get quick and dirty
> interfaces between emacspeak and any speech synthesizer happening
> quickly. Essentially, it just doesn't add voice lock type commands to
> the text emacspeak sends to the speech servers. So, instead of
> solutions which attempt to create a basic interface by having a script
> which strips out dtk or outloud commands, you can create text streams
> which just have text and eliminate the need to do any stripping. I was
> going to use this to create new double talk and flite interfaces which
> provided just basic speech.
> Once you have emacspeak generating TTS commands which are SSML based,
> all that probably remains to do is create a tcl script which connects
  No ! no need to go through a tcl script ! As I said
  in my previous mail there are already enough layers not
  add one more ! If we use, enhance, improve ...
  speechd-el we already have a speech-dispatcher client.
> to speech dispatcher and passes the SSML tagged text to speech
> dispatcher via a socket, plus add support for commands such as
> changing punctuation and some of the useful but not essential bonus
  Already done by speechd-el. The interface I actually
  have, (sorry but without emacspeak)
  allows : rate control, punctuation control, language control
  with emacs commands. Only certain high level function
  of emacspeak are not provided.

  I would say once more : no need work for low level
  function : the work is already done ! we can
  work for high level features.
> options, like split caps, all caps beep etc. In fact, it wouldn't even
  These features are implemented as well by speechd-el.
> need to be a tcl script - I only mention it as all the other helper
> interface scripts are tcl. It could really be any language you
> like. Alternatively and possibly better left as a later task, you
> could bypass the helper scripts completely and create a direct
> interface to speech dispatcher from within elisp - check out the
> speech-dispatcher.el file for clues on doing this. However, if you go
> the direct interface route, you will have to do a fair amount of
> additional work which has already been done in the tcl tts-lib.tcl
> file by Raman, which is why I'd probably go the tcl interface helper
> route initially. 
  Hum ! On which side the work is the most
  advanced ! I must confess that I better know
  the speechd-el part but :
  1. I must confess again I dont like tcl !
  2. I dont see any interst to add a layer.
  3. Raman probably did not thought about the
  multilingual aspect in his interface since
  it is done in speechd-el.
> With respect to getting emacspeak to support multiple languages, I
> think this is a much more difficult task. Raman is the person to
> provide the best guidence here, but as I understand it, quite a lot of
> emacspeak would need to be changed. The source of the problem here I
> think is mainly due to the fact that historically, many hardware
> synths, like the dectalk, only supported single byte character sets
> and only handled 7 bit ascii reliably. Therefore, Raman did
> considerable work to incorporate character mapping and translation
> into emacspeak to ensure that only 7 bit characters are ever sent to
> the speech synthesizer.
> This means that to get reliable support for multi-byte character sets
> and even 8 bit character sets, quite a bit of patching would be
> required. To make matters more complex, although most new software
> synthesizers (and even some hardware ones) will support at least the
> full 8 bits and some even multi-byte characrter sets, emacspeak would
> need some way of knowing this in order to provide consistent and
> reliable processing of characters and mapping of 'special' characters
> to meaningful spoken representations. However, currently, emacspeak
> doesn't have any facility which dynamically allows it to change its
> mapping of characters based on the capabilities of the speech
> synthesizer. While speech dispatcher may be able to handle this to
> some extent, we need to ensure support for existing synthesizers is
> not broken. 
> Although I actually know very little about other character sets,
> especially multibyte ones, I'd also be a little concerned about how
> emacs itself is evolving in this respect. From many of the posts on
> the emacs newsgroups, I get the impression that this is still a rather
> mirky and inconsistent aspect of emacs. It would certainly be
> important to check out emacs 22 and see what has changed there before
> making a lot of changes to eamcspeak. Definitely make sure you get
> guidence from Raman as in addition to him knowing more about emacspeak
> than anyone else, he is also the person who has had probably the most
> experience in dealing with issues like this. 
> While my personal time (like everyone else) is just too scarce at
> present, especially due to some large work projects I am taking on, I
> certainly would be prepared to try and provide some support in getting
> emacspeak support for speech dispatcher. I just don't know how much
> time I will have and what my response times will be like - probably
  Time is everyone's problem Tim , but if we might
  collaborate we might be more efficient !
> pretty slow! However, don't hesitate to drop me an e-mail if you need
> some help and I'll see what I can do, just no promises!
> good luck


To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@cs.vassar.edu" with a
subject of "unsubscribe" or "help"

Emacspeak Files | Subscribe | Unsubscribe | Search