Introduction

[eSpeak[1 is a wonderfully responsive, light, small footprint cross platform tts.

Problem:

  • From version to version, sometimes phonemes change, or rules get changed, breaking the pronunciation of certain words.
  • For someone who wishes to contribute, they want to improve the language and observe what words may have changed due to this or that rule being changed.
  • The blind users especially on linux are afraid of compiling espeak to try their own alterations because the probability that they will end up with no sound for their screenreader is very high.

Proposal

For each language/directory: * ph_language: phoneme file * language_{rules,list,listx} * tests.txt: A json file containing test data (further described below). * 00001.{wav,md5} 99999.{wav,md5}, output wav files of the matching ids in the test file, and their matching md5. * corrected/0001.{wav,md5}: not used by the system, but just incase we need to refer back to last accepted version of a word/phrase.

tests.txt is of the following format, a list of blocks, first block contains espeak voice settings, such as the rate, pitch, voice name for which the data should be run against. Each subsequent block has: * id, i.e. 0001 0002, etc. * input: a string of text which will be sent to espeak. * output: expected phonemes produced by the given input. * acceptedHash: Added by the user to state that the sound of this word/phrase is correct and a warning should be generated if the output is different from this.

Top level script should do the following: * Recurse into each language/varient directory. * Compile ph_language, language_{rule,list,listx} * For each tests.txt, produce the wav and md5 files for each id. * If an md5sum has already been accepted for a given id, and the current run produces something different, then append the id to a warnings.txt in the same directory.

Overlooked

If I have overlooked anything, please find me on irc (mhameed on irc.oftc.net, or email