[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Digest Of Replies Was Re: TTS Servers



Looks like the following three  messages never amde it out of my home
machine last night.


---- Begin included message ----
>>>>> "Tim" == Tim Cross <tcross@rapttech.com.au> writes:

    Tim> Yes, very little changes are needed for most servers to get
    Tim> them integrated. In fact, off the top of my head, I think I
    Tim> only needed to modify dtk-voices.el and .servers to integrate

the only way I know to move this forward is to explicitly specify
where/what needed to be changed and refactored.

In the past you've gotten things working and pointed me at it,
unfortunately I have little/no time to spare, which means I never get
around to downloading what you've pointed me at, compare the files and
figure out what you had to change. 

    Tim> the Cepstral Theta server into emacspeak. This was mainly
    Tim> done to enable support for in-line TTS commands which require
    Tim> "balanced tags" such as XML and SSML in particular. The way I
    Tim> have implemented this, based on the basic framework you had
    Tim> already put into place, needs a little more "generalization"
    Tim> to enable other servers to take advantage of this facility
    Tim> without requiring modifications to dtk-speak.el.

The  code in proc clean in the TTS layer is in my opinion best left in
TCL.
I haven't seen any convincing reason to take on the work of rewriting
it in elisp; 
and an abstract reason such as something else somewhere sometime will
take advantage of it is not something that serves as enough
motiviation. Remember the Perl edict --- a good one in general -- a
good programmer is a lazy programmer. You never write code for the
sake of writing code.


    Tim> I was mainly referring to our previous discussions concerning
    Tim> moving some of the "cleaning" code, currently done in the TCL
    Tim> scripts, back into elisp to make processing simpler for
    Tim> servers which benefit from an SSML type tags command

Independent of the above, we should at some point add in SSML support
-- probably through an ssml-voices.el file.

Your idea below of having a voices.el file is good; it should be
called generic-voices.el.
It might even end up being the same as ssml-voices.el

    Tim> interface - for example, to escape XML type tags prior to
    Tim> adding SSML commands to avoid them being interpreted by the
    Tim> SSML capable server etc. I believe this would also make the
    Tim> TCL script simpler to write (although they are pretty simple
    Tim> already) and help ensure features like split caps work

the point is the TCL scripts are already written, so the "would make
it simpler to write" though perhaps true, cant be verified until you
do the extra work of writing it in lisp, python, or your favorite
other language.

    Tim> consistenty between servers.

Consistency here is hard to define; I suspect there will always be
differences based on functionality.
It's also the classic 80/20 rule; given no time to work on it, how
much can one live with? As an example, the outloud server until two
weeks ago did not emit tones.
It was one of those things that "were nice to have" but the niceness
factor did not outweigh the lack of time factor --- I had some time
over the holidays and ended up adding this in using the beep rpm -- it
ends up being quite useful on laptops. 
I'll be mostly focusing on alsa the next few months if and when I get
time, since there is a lot to learn there, and a lot to take advantage
of once you know it.


    Tim> There is also some minor cleanup work needed for some legacy
    Tim> dtk in-line commands which sneak through from the elisp code
    Tim> now and again. This however is minor and should be easily
    Tim> fixed.

Again, it gets fixed when its nuisance value gets greater than the
lack of time factor.


    Tim> Some other issues which may or may not require consideration
    Tim> and action are things like support for multi-byte
    Tim> characters. As I understand it, one of the restrictions

Well, I dont use multibyte chars in my daily work. 
Emacs is still growing up to multibyte - but as far as my needs go I
dont need it, nor do I know many good multibyte engines that would
take advantage of this yet. So again it remains an abstract need that
will get filled when and if it becomes an honest necessity.

As a case in point the Japanese bil-lingual project addressed some of
these issues because it was critical for that project.

    Tim> within emacspeak in relation to multi-byte characters is
    Tim> somewhat legacy based due to the hardware dectalk only
    Tim> supporting single byte characters. Newer TTS engines, don't
    Tim> necessary have this limitation. Therefore, should constraints
    Tim> such as character sets and associated mappings be handled
    Tim> within emacspeak or within the software specific to the TTS

eventually it should be handled in the <engine>-voices.el file.

    Tim> engine? Is this even an issue requiring consideration? I'm
    Tim> just mentioning it as a possible consideration and it may or
    Tim> may not be relevant.

I'll reply to your other messages in sequence.

    Tim> Tim

>>>>> "tvr" == T V Raman <tvraman@comcast.net> writes:

    tvr> Actually there is very little that needs to be done to make
    tvr> option 1 complete, and as you say the other speech server
    tvr> frameworks out there are not sophisticated enough since
    tvr> they've mostly done a least common denominator approach.

    tvr> As things stand in the design the intent is that you
    tvr> shouldn't have to modify elisp or tcl files; in practice,
    tvr> thsi can be proven only by writing more servers, discovering
    tvr> where mods are needed, and refactoring code appropriately;
    tvr> discussion in the abstract usually leads to mud slinging and
    tvr> stone throwing, nothing else.

    tvr> If you examine how the dectalk and viavoice support works
    tvr> today, the dectalk specific code is now in dectalk-voices.el;
    tvr> the viavoice code in outloud-voices.el, and the TCL layer
    tvr> mirrors this, with the common TCL code in tts-lib.tcl.

    tvr> The name "dtk" is legacy and should be thought of as a
    tvr> synonym for tts --- I made sure of this the last time I
    tvr> refactored the code and named things that were dectalk
    tvr> specific with a dectalk- prefix.

>>>>> "Tim" == Tim Cross <tcross@rapttech.com.au> writes:

    Tim> I think Raman's idea is a good one and I would certainly be
    Tim> willing to participate in a team which worked on speech
    Tim> servers for emacspeak. The current job I'm in means I will
    Tim> not have a lot of time for this project until after August,
    Tim> but am certainly willing to try and contribute when possible.

    Tim> If the emacspeak community decides this would bea good model
    Tim> to follow for providing speech server support, I think we
    Tim> need to start by looking at how we may be able to slightly
    Tim> modify the architecture of emacspeak so that additional
    Tim> servers do not require modification to the core emacspeak
    Tim> code-base. Currently, if you want to create a new server
    Tim> which is integrated into emacspeak in the same way as
    Tim> existing servers, you need to modify some of the emacspeak
    Tim> source code. I feel that if we are going to introduce another
    Tim> group, as far as possible, we need to have an architecture
    Tim> where Raman (or whoever) can extend emacspeak functionality
    Tim> without reference to the work done by another group which is
    Tim> adding speech servers.

    Tim> I feel we have a couple of options along these lines -

    Tim> 1. We could modify the existing code base so that we have a
    Tim> very well defined speech server interface layer. This would
    Tim> be the easiest option in my view as Raman has already got
    Tim> much of the work done - its really just a bit of cleanup work
    Tim> and moving some processing which currently happens at either
    Tim> the TCL server script level into the elisp layer or vice
    Tim> versa.

    Tim> 2. Possibly examine modifications to emacspeak so that it can
    Tim> work with other frameworks which have been developed for
    Tim> interfaces to generic speech servers. The speechd project is
    Tim> an example of this sort of approach. I also believe a group
    Tim> has been formed to create a uniform speech interface which
    Tim> KDE, GNOE et. al. would use and perhaps we should examine how
    Tim> feasible this might be. The main drawback I can see is that
    Tim> some of these projects don't seem to support the advanced
    Tim> features of emacspeak (e.g. don't handle multiple voices
    Tim> well, auditory icons etc), plus this would require possibly
    Tim> substantial changes to the emacspeak architecture.

    Tim> Other points of view, comments, concerns etc welcomed and
    Tim> encouraged. We need to contribute if we want emacspeak to
    Tim> evolve. I actually feel we are getting close to a time where
    Tim> emacspeak requires more maintainers, not just for speech
    Tim> servers but also for the emacspeak code base itself. Raman
    Tim> has held it together for a long time now, but he has other
    Tim> interests and responsabilities and its probably time us as
    Tim> users started taking on some of the tesponsability for its
    Tim> maintenance and development.

    Tim> Tim

>>>>> "tvr" == T V Raman <tvraman@comcast.net> writes:

    tvr> An FAQ would be a good start. The next step would be to put
    tvr> together a small team that took responsibility for creating
    tvr> and maintaining speech servers. The reason I have not
    tvr> bothered updating the Software Dectalk support is that no
    tvr> more than ahandful of users out there bothered with even
    tvr> 4.61, and it's just not worth the effort required to maintain
    tvr> multiple speech servers for such a small user base. Under
    tvr> those the only thing that works is if the person who wants it
    tvr> the most puts in the effort. In this case, it's not me, since
    tvr> I already have my needs fully met.

    tvr> -- Best Regards, --raman

      
    tvr> Email: raman@cs.cornell.edu WWW:
    tvr> http://emacspeak.sf.net/raman/ AIM: TVRaman PGP:
    tvr> http://emacspeak.sf.net/raman/raman-almaden.asc IRC:
    tvr> irc://irc.gnu.org/emacspeak

    tvr> -----------------------------------------------------------------------------
    tvr> To unsubscribe from the emacspeak list or change your address
    tvr> on the emacspeak list send mail to
    tvr> "emacspeak-request@cs.vassar.edu" with a subject of
    tvr> "unsubscribe" or "help"

    tvr> -- Best Regards, --raman

      
    tvr> Email: raman@users.sf.net WWW: http://emacspeak.sf.net/raman/
    tvr> AIM: TVRaman PGP:
    tvr> http://emacspeak.sf.net/raman/raman-almaden.asc

    tvr> -----------------------------------------------------------------------------
    tvr> To unsubscribe from the emacspeak list or change your address
    tvr> on the emacspeak list send mail to
    tvr> "emacspeak-request@cs.vassar.edu" with a subject of
    tvr> "unsubscribe" or "help"

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    TVRaman
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
---- End included message ----

---- Begin included message ----

This will definitely help.
Incidentally I added a section in the texinfo manual a few months ago
documenting the speech server API --- take a look at it if you've not
already done so.
>>>>> "Tim" == Tim Cross <tcross@rapttech.com.au> writes:

    Tim> As I'm hoping to implement replacements support for the
    Tim> Cepstral TTS using their new swift engine, which has a
    Tim> different API to the previous theta engine, I thought I'd
    Tim> document the process for creating support for a new TTS
    Tim> engine. This should make it easier for others to create new

Here is the process I'd suggest:

Create two files --- <engine>-voices.el and a TCL script <engine>

pull in tts-lib.tcl in <engine>
and see how much you have to rewrite --- in doing so compare files
outloud and dtk-exp and see what if anything can move into ts-lib.tcl

While writing <engine>-voices.el, write the engine setup functions
that get called from tts-configure-synthesis-setup;
eventually that function should be passed an engine name and do a
table lookup to retreive a defstruct that provides it the engine
specific settings. Since there are only two engines configured there
at present I've been lazy to d this, and will remain so until there
is a reason to  introduce the table; it's a small piece of work.

    Tim> interfaces for other synthesizers.

    Tim> I have also done a generic-voices.el file which I think would
    Tim> be a good add-on for emacspeak. This file can be sued to

see note about ssml-voices.el in the earlier message.

    Tim> provide very generic support for almost any synthesizer. I've
    Tim> used it for the "generic" non-SSML version of theta. I feel
    Tim> it could be useful for engines like eflite as it does away
    Tim> for the requirement to "clean" TTS specific codes from the
    Tim> input, whcih was the approach used last time I checked out
    Tim> eflite. While this sort of file would never be able to
    Tim> support multiple voices and many other "advanced' features,
    Tim> sometimes its worth foregoing such features in return for a
    Tim> simple and reliable voice.

Speech servers should not in general pretend to be a dectalk and later
clean up dectalk codes; that is a solution that will not scale.

Bottom line; Emacspeak is implemented in lisp; if you write a speech
server, you need to write a corresponding <engine>-voices.el file that
serves to bridge to the server.


    Tim> Tim

    Tim> -----------------------------------------------------------------------------
    Tim> To unsubscribe from the emacspeak list or change your address
    Tim> on the emacspeak list send mail to
    Tim> "emacspeak-request@cs.vassar.edu" with a subject of
    Tim> "unsubscribe" or "help"

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    TVRaman
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
---- End included message ----

---- Begin included message ----

well, investigate it, and tell me what you discover.

>>>>> "Tim" == Tim Cross <tcross@rapttech.com.au> writes:

    Tim> One other thing I forgot to mention is that I'm not sure if
    Tim> we should totally ignore other TTS interfaces. While it would
    Tim> be necessary to investigate what would be involved, something
    Tim> like the speech-dispatcher approach is probably worth
    Tim> investigating further. I know that it has become a lot more
    Tim> sophisticated, with support for auditory icons, multiple
    Tim> voices and multiple languages plus SSML. While it is possible
    Tim> that the approaches are so different that no true integration
    Tim> can be achieved, it should still be evaluated fully. The
    Tim> benefit of such approaches is that by creating just a single
    Tim> interface, we immediately gain support for a number of
    Tim> different TTS engines, including festival, flite, apollo,
    Tim> software dtk, epos, llia_phon etc - all of which can be
    Tim> maintained with a single interface.


    Tim> Tim-
>>>>> "tvr" == T V Raman <tvraman@comcast.net> writes:

    tvr> Actually there is very little that needs to be done to make
    tvr> option 1 complete, and as you say the other speech server
    tvr> frameworks out there are not sophisticated enough since
    tvr> they've mostly done a least common denominator approach.

    tvr> As things stand in the design the intent is that you
    tvr> shouldn't have to modify elisp or tcl files; in practice,
    tvr> thsi can be proven only by writing more servers, discovering
    tvr> where mods are needed, and refactoring code appropriately;
    tvr> discussion in the abstract usually leads to mud slinging and
    tvr> stone throwing, nothing else.

    tvr> If you examine how the dectalk and viavoice support works
    tvr> today, the dectalk specific code is now in dectalk-voices.el;
    tvr> the viavoice code in outloud-voices.el, and the TCL layer
    tvr> mirrors this, with the common TCL code in tts-lib.tcl.

    tvr> The name "dtk" is legacy and should be thought of as a
    tvr> synonym for tts --- I made sure of this the last time I
    tvr> refactored the code and named things that were dectalk
    tvr> specific with a dectalk- prefix.

>>>>> "Tim" == Tim Cross <tcross@rapttech.com.au> writes:

    Tim> I think Raman's idea is a good one and I would certainly be
    Tim> willing to participate in a team which worked on speech
    Tim> servers for emacspeak. The current job I'm in means I will
    Tim> not have a lot of time for this project until after August,
    Tim> but am certainly willing to try and contribute when possible.

    Tim> If the emacspeak community decides this would bea good model
    Tim> to follow for providing speech server support, I think we
    Tim> need to start by looking at how we may be able to slightly
    Tim> modify the architecture of emacspeak so that additional
    Tim> servers do not require modification to the core emacspeak
    Tim> code-base. Currently, if you want to create a new server
    Tim> which is integrated into emacspeak in the same way as
    Tim> existing servers, you need to modify some of the emacspeak
    Tim> source code. I feel that if we are going to introduce another
    Tim> group, as far as possible, we need to have an architecture
    Tim> where Raman (or whoever) can extend emacspeak functionality
    Tim> without reference to the work done by another group which is
    Tim> adding speech servers.

    Tim> I feel we have a couple of options along these lines -

    Tim> 1. We could modify the existing code base so that we have a
    Tim> very well defined speech server interface layer. This would
    Tim> be the easiest option in my view as Raman has already got
    Tim> much of the work done - its really just a bit of cleanup work
    Tim> and moving some processing which currently happens at either
    Tim> the TCL server script level into the elisp layer or vice
    Tim> versa.

    Tim> 2. Possibly examine modifications to emacspeak so that it can
    Tim> work with other frameworks which have been developed for
    Tim> interfaces to generic speech servers. The speechd project is
    Tim> an example of this sort of approach. I also believe a group
    Tim> has been formed to create a uniform speech interface which
    Tim> KDE, GNOE et. al. would use and perhaps we should examine how
    Tim> feasible this might be. The main drawback I can see is that
    Tim> some of these projects don't seem to support the advanced
    Tim> features of emacspeak (e.g. don't handle multiple voices
    Tim> well, auditory icons etc), plus this would require possibly
    Tim> substantial changes to the emacspeak architecture.

    Tim> Other points of view, comments, concerns etc welcomed and
    Tim> encouraged. We need to contribute if we want emacspeak to
    Tim> evolve. I actually feel we are getting close to a time where
    Tim> emacspeak requires more maintainers, not just for speech
    Tim> servers but also for the emacspeak code base itself. Raman
    Tim> has held it together for a long time now, but he has other
    Tim> interests and responsabilities and its probably time us as
    Tim> users started taking on some of the tesponsability for its
    Tim> maintenance and development.

    Tim> Tim

>>>>> "tvr" == T V Raman <tvraman@comcast.net> writes:

    tvr> An FAQ would be a good start. The next step would be to put
    tvr> together a small team that took responsibility for creating
    tvr> and maintaining speech servers. The reason I have not
    tvr> bothered updating the Software Dectalk support is that no
    tvr> more than ahandful of users out there bothered with even
    tvr> 4.61, and it's just not worth the effort required to maintain
    tvr> multiple speech servers for such a small user base. Under
    tvr> those the only thing that works is if the person who wants it
    tvr> the most puts in the effort. In this case, it's not me, since
    tvr> I already have my needs fully met.

    tvr> -- Best Regards, --raman

      
    tvr> Email: raman@cs.cornell.edu WWW:
    tvr> http://emacspeak.sf.net/raman/ AIM: TVRaman PGP:
    tvr> http://emacspeak.sf.net/raman/raman-almaden.asc IRC:
    tvr> irc://irc.gnu.org/emacspeak

    tvr> -----------------------------------------------------------------------------
    tvr> To unsubscribe from the emacspeak list or change your address
    tvr> on the emacspeak list send mail to
    tvr> "emacspeak-request@cs.vassar.edu" with a subject of
    tvr> "unsubscribe" or "help"

    tvr> -- Best Regards, --raman

      
    tvr> Email: raman@users.sf.net WWW: http://emacspeak.sf.net/raman/
    tvr> AIM: TVRaman PGP:
    tvr> http://emacspeak.sf.net/raman/raman-almaden.asc

    tvr> -----------------------------------------------------------------------------
    tvr> To unsubscribe from the emacspeak list or change your address
    tvr> on the emacspeak list send mail to
    tvr> "emacspeak-request@cs.vassar.edu" with a subject of
    tvr> "unsubscribe" or "help"

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    TVRaman
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
---- End included message ----