Section 0: Introduction

In this article we discuss our experiences with understanding the eSpeakedit documentation and processes. We started exploring the instructions to add a language to eSpeak, in our case Arabic. We found some of the processes difficult to follow or hard to do so we have written up where we are and how we got there in case this is helpful.

We should be clear that this document does not contain all the relevant information but that you should read the other documents provided (all linked to or mentioned) in case we have missed anything or in places where our instructions are not complete.

We started out by reading http://eSpeak.sourceforge.net/add_language.html

This told us the basics of what we would need to add a new language:

The first thing we understood is that we needed to create a phoneme file e.g. ph_language and add it to the phonemes master file. The new phoneme file will inherit a basic set of consonants and control phonemes or, you can make it inherit from another language. It will also define additional phonemes such as vowels, diphthongs and additional consonants or phonemes that differ from the inherited version.

We realised at this point that we needed to determine which sounds are currently available in other languages already provided by eSpeak and prepare the phoneme data for the missing phonemes before we could continue. Most of the work so far has been on this and is explained below in section 1.

As well as a phoneme file a new language needs two dictionary files. The dictionary files define how text in the language should be translated into phonemes. We haven't started this part yet, see section 2.

A new language also needs voice files, we haven't made these yet either.

If you don't what a phoneme, diphthong etc. is, wikipedia helps :) http://en.wikipedia.org/wiki/Phoneme

Section 1: Phonemes

1.1. Decide what phonemes are missing

We know the letters in the alphabet, but what we need to know is the phonemes for these letters and a letter may require multiple phonemes. Your ph_language file is most probably inheriting some phonemes from the base files or other languages.

We evaluated all phonemes which had already been processed for eSpeak and listened to all of them in eSpeakedit to see if any of them matched the required phonemes for our new language.

We started creating the phoneme file following the instructions on http://eSpeak.sourceforge.net/phontab.html adding in the phonemes we had found from other languages which were useful to us.

The other outcome of this step is that we had identified which phonemes were missing.

N.B. For Arabic we believe that due to the diacritics informing us of how the sound should be changed we do not need to listen to how consonants affect the adjacent vowels as may be required for other languages.

1.2. Creating the missing phonemes

You need to be sighted or will need sighted assistance to do most of the work in this section since the process is not accessible using a screenreader.

1.2.1 Installing eSpeakedit

eSpeakedit can be downloaded from http://eSpeak.sourceforge.net/download.html (bottom of page, search for eSpeakedit).

As explained in the eSpeakEdit ReadMe file you must ensure that the versions of eSpeak and eSpeakedit are compatible. The ReadMe also explains what packages you need to have installed and that you may need to recompile eSpeakedit and this needs further packages to be installed.

1.2.2 Installing modified Praat

Downloaded the source from http://www.fon.hum.uva.nl/praat/download_sources.html and followed the instructions in the eSpeakedit praat-mod directory.

If when building Praat it errors with mentions of X11/extensions/Print.h being missing we found from http://ubuntuforums.org/showthread.php?t=665803 that X11/extensions/Print.h is in the x11proto-print-dev package. and installing this solved our problems.

If you have trouble compiling, running modified Praat or getting non-empty file output from Praat in the steps in 1.2.4 then we suggest you check on the forums at http://sourceforge.net/forum/?group_id=159649 and/or contact jonsd at users dot sourceforge.net. We have experienced a variety of difficulties on machines with different platforms and with differing versions of Ubuntu, eSpeak, eSpeakedit and Praat so your mileage may vary and we don't feel we can provide anything useful here for your specific scenario.

1.2.3 Recording the sounds

We found that you need to record the sound in mono following the instructions for recording sounds on http://espeak.sourceforge.net/analyse.html

1.2.4 Analysing the sound using Praat

  • Open modified Praat and close Praat picture window (we don't know what this is for, do you?)
  • Read sound file, new object appears
  • Edit new object
  • Cut to the section that includes only the sound you are interested in
  • This should results in a small piece of sound no longer than a fraction of a second
  • If we try to extract this immediately using to_eSpeakedit modified Praat complained (something about window sizes, did you have this problem or know what is means?) so we saved the new sound and then reloaded it
  • Select the newly reloaded object and click on To Spectrum and then To eSpeak to output a file spectrum.dat which you might find in your Praat folder

1.2.5 Simulating the sound using eSpeakedit and output from Praat

We followed the instructions in http://espeak.sourceforge.net/editor.html http://espeak.sourceforge.net/editor_if.html though they aren't specific to what we were trying to do though we found them a useful guide for the interface and how to alter pre-existing phonemes from eSpeak.

This is what we came up with: - Open eSpeakedit, open spectrum.dat - You will see a sequence of frames which make up the sound on the right. On the left there are two tabs, Spect and Text. eSpeakedit opens with Text tab open, you need to select the Spect tab. - Each of the frames should show a black curve which is the actual sound and blue marked peaks which were added by Praat analysis. - Click on the first frame. - By changing the formats in the Spect tab try to match the blue peaks with formats f0-f5 (do we need to do the rest? we aren't sure) - To see your changes in the frame, left click on it and a green graph will appear - We found that the formant changes we made did not seem to commit or save unless you triggered the frame to become a key frame (yellow border when not currently selected) by right clicking on the frame and selecting copy peaks down. We aren't sure if this is supposed to be the case. - Once you have finished changing the formants you can use F1 to play the selected frame and F2 to play the whole sound - Do the same for each of the frames. Once they are all keyframes (yellow border) you should evaluate the resultant phoneme. Don't forget to save it!

1.2.6 Evaluating resultant eSpeak phoneme

Evaluate the phoneme by pressing F2 in eSpeakedit and see if the full phoneme sounds correct. If you feel that it is not quite correct you should restart the process at 2.3. If you initial feeling is that it might be correct you should: * Add it to the file ph_language by following the instructions in http://espeak.sourceforge.net/phontab.html * In eSpeakedit do compile->compile phoneme data * Now in the top box of the Text tab of eSpeakedit write a word in your language in terms of phonemes. It is important to select words that use only 1 new phoneme (the one to be evaluated) with all the rest those that are already accepted. * To write in phonemes mode start with ? and close with . You will need to look up the phonemes that are already available to you either in your ph_language or in files which you have inherited from (see step 1.1 above). * Press translate * Press speak and listen :) * If it sounds good, yay, try lots more words. * If your phoneme works well for some words and not others this suggests that your letter requires several phonemes for different cases so keep it but add another phoneme to be found to your list from step 1.1. * If it sounds bad all the time return to step 1.2.3

1.2.7 Iteration of process

Once you have found one phoneme you are completely happy with remove it from your missing list and move on to the next one. We think you should be allowed to use this new phoneme in step 1.2.6 when evaluating others because otherwise we will never get anywhere.

Once you have all the phonemes you require move on to the next section.

Section 2: Dictionaries

You should start by reading section 4.1 and 4.2 on http://espeak.sourceforge.net/dictionary.html as they explain what we are about to do and the required notations. Keep in mind that the phoneme names are those defined in your ph_language and other files that you have inherited from, see section 1.1 above.

2.1 Rules

Depending on how complex the rules for the language is this should be pretty straightforward but could be potentially be time consuming. See section 4.3 on http://espeak.sourceforge.net/dictionary.html which should be quite self explanatory.

2.2 Exceptions

language_list will contain exceptions to the rules such as names, foreign words, letter names and so on. Please see section 4.4, 4.6 (and optionally 4.5, let us know how you get on with this :) ) on http://espeak.sourceforge.net/dictionary.html

Section 3: Voices

The instructions are to be found on http://espeak.sourceforge.net/voices.html

Section 4: Current state with Arabic

Since we have very limited time available due to committments with Orca, NVDA and PhD research we haven't progressed as far as we would like. We have followed steps 1.1 and have iterated through step 1.2 but we have not processed all the required phonemes. A basic start has been made on 2.1 and 2.2 but these can't really be proceeded with until the phonemes are in place.

If you are sighted and you are working on Linux and you can follow these steps and you wish to help out then please feel free to contact me.