See the code on GitHub.

Fun fact: Apple’s Siri is voiced by a woman namedĀ Susan Bennett, but have you ever wanted to be your own text-to-speech voice? Well, following this tutorial, you can voice your own basic computer voice in Python 3.

Before we get started, however, please note that is absolutely not the commercial way to do this. That requires a deep understanding of linguistics and do you see linguistics lessons on this website?

Prerequisites:

  • Audio editing software. (See Audacity)
  • Python 3
  • PyAudio
  • Modified CMU Pronouncing Dictionary (on GitHub repo above)

Steps:

  1. First, let’s download the modified version of the Carnegie Mellon University Pronouncing Dictionary. It maps nearly every word in American English to their respective pronunciation. The pronunciation is represented using ARPAbet which is like the IPA but without the steroids. Also, note how some of the words have a number next to them. These denotes a homophone which our program won’t be able handle correctly, since the correct pronunciation is based on the context of a sentence.
  2. Next, check out the ARPAbet Wikipedia page and record yourself saying every single symbol in a neutral tone. Make sure everything is clear and consistent. This is going to be tedious, but it will be worth it!
  3. Do your best to isolate the individual sounds in your audio editing software and save them as “<insert symbol here>.wav” without the brackets into a folder named “sounds.”
  4. Time to write code! Save the following Python 3 code as “load.py.”

What this is doing is upon instantiation of the TextToSpeech class, it loads the data from the dictionary file into memory as a dict, mapping each word to a list of its sound string representations.

Then when we call theĀ get_pronunciation method, he search through each word in our str_input argument and if the word is present in the dict as a key, then we add the associated list to our total list of sounds to say.

For smoothness and fluidity, we then play each sound on their own thread which he delay an arbitrary amount.

Test it, and it works!

 

This project is licensed under the Apache License 2.0.

8 thoughts on “Python Basic Text-to-speech Engine | AR Tech Tuts”

  1. What i don’t understood is in fact how you are no
    longer really a lot more smartly-liked than you might be right now.
    You are so intelligent. You realize thus considerably relating to this topic,
    made me personally consider it from a lot of various angles.
    Its like men and women are not involved except it is
    one thing to do with Woman gaga! Your own stuffs nice.

    All the time care for it up!

  2. Hello, I am using a raspberry pi for a project but it doesn’t output any speech, how can i fix this or make it work on the raspberry pi?

  3. you can play a .wav file by using this on my Raspberry pi: aplay /.wav

    example:

    aplay /usr/share/scratch/Media/Sounds/Vocals/Singer1.wav

    what in the the code do i need to modify? please email me.

    1. Hi Zurechtweiser,

      Sorry for the extremely late reply. If you can figure out how to program natural intonation, it’s just a matter of finding some way to pitch the audio then play it. As for an exact implementation, I’m not sure. It is open source so you can play around with it.

Leave a Reply

Your email address will not be published. Required fields are marked *