See the code on GitHub.

Fun fact: Apple’s Siri is voiced by a woman namedĀ Susan Bennett, but have you ever wanted to be your own text-to-speech voice? Well, following this tutorial, you can voice your own basic computer voice in Python 3.

Before we get started, however, please note that is absolutely not the commercial way to do this. That requires a deep understanding of linguistics and do you see linguistics lessons on this website?

Prerequisites:

  • Audio editing software. (See Audacity)
  • Python 3
  • PyAudio
  • Modified CMU Pronouncing Dictionary (on GitHub repo above)

Steps:

  1. First, let’s download the modified version of the Carnegie Mellon University Pronouncing Dictionary. It maps nearly every word in American English to their respective pronunciation. The pronunciation is represented using ARPAbet which is like the IPA but without the steroids. Also, note how some of the words have a number next to them. These denotes a homophone which our program won’t be able handle correctly, since the correct pronunciation is based on the context of a sentence.
  2. Next, check out the ARPAbet Wikipedia page and record yourself saying every single symbol in a neutral tone. Make sure everything is clear and consistent. This is going to be tedious, but it will be worth it!
  3. Do your best to isolate the individual sounds in your audio editing software and save them as “<insert symbol here>.wav” without the brackets into a folder named “sounds.”
  4. Time to write code! Save the following Python 3 code as “load.py.”

What this is doing is upon instantiation of the TextToSpeech class, it loads the data from the dictionary file into memory as a dict, mapping each word to a list of its sound string representations.

Then when we call theĀ get_pronunciation method, he search through each word in our str_input argument and if the word is present in the dict as a key, then we add the associated list to our total list of sounds to say.

For smoothness and fluidity, we then play each sound on their own thread which he delay an arbitrary amount.

Test it, and it works!

 

This project is licensed under the Apache License 2.0.

One thought on “Python Basic Text-to-speech Engine | AR Tech Tuts”

Leave a Reply

Your email address will not be published. Required fields are marked *