Emulating Human Voice-overs with TTS Voices
September 17th, 2009
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
This is a follow-up that I promised in the last paragraph of my last blog entry, ‘Can Robots be Funny?’ This blog entry will not only perform the follow-up but will also segue nicely into my new topic, a topic that shall wait until the conclusion of this blog to introduce. Anyway, I had proposed this scenario: say we have a cinematic cut-scene where there are several actors in the scene (or trailer), but let’s say our budget can maybe only afford one human voice-actor. So, you’ve recorded your one human voice actor doing each role of the entire cast. Next, we use the techniques discussed in my aforementioned article to create an ensemble of TTS actors and sync their synthetic voices to the phrasings and expressions of your human model. Not only that, we will deploy only 1 TTS male voice and 1 TTS female voice to cover our entire casting of 7 characters (that’s 3 adult males, 2 adult females and 2 children)! These tasks we’ll accomplish with a suite of video and audio editing software (markup languages not being practical for trailers, etc).
First, listen to this excerpt where you will hear the human model’s VO.
Now listen to the same animation again. This time however, the audio you hear is our TTS VO.
Recall that from my earlier blogs, the tasks included lengthening or shortening the TTS vowels and syllables to match those of the human model. See Figure 1.

Figure 2- Next, view the whole enchilada, as it were, this time with the entire cast of male and female TTS voice-actors doing their stuff, including a crowd sound effect for the background restaurant ambience:
“- Link to”
YouTube Visual Purple
Figure 3-Window dressing
Sound effects! Well, if you missed my previous blog (remember the talking fish?), you should visit it to read a discussion on sound effects. The background sound effects help the ambience place the scene in a more immersive environment, doesn’t it? And then there’s…
MUSIC! That’s right! Here’s where I depart for now from my series of blogs on TTS voice-actors (‘bout time isn’t it?). Background music can be an additional sweetening, adding greatly to your scene. “But wait!” you may be saying, “I thought the focus of this exercise was our tight budget!?”
You are so correct. How about copyright free, royalty free music? I’ll explore that a bit more later, but now let’s have a look and listen to this scene, but rendered with (royalty free) background music. This is a restaurant scene, and many restaurants provide music for their clientele, right?
Figure 4– Complete scene rendered with TTS voices and background music
“- Link to”
YouTube Visual Purple
Now see how effective the music was in providing a more pleasing restaurant ambience? In fact, without the music, the scene was rather sterile, wasn’t it? Even with the ambient crowd chatter in the background it was stale, but the music made the scene, well, …right!
Conclusion
Whenever you can afford real actors, do it! But when budget screams for relief, maybe try some of the things I have offered in these blogs about synthetic VO.
And algorithmic music? Yes. Algorithmic. Why not opt for copyright free, royalty free music since we have the opportunity and whenever it’s appropriate to our project, not to mention our budget. In fact, that will be the topic of my next blog, “When Your Musician is a Robot (Can Automated Composers Write Good Music?)”. We’ll demonstrate various musical styles and include movie clips for you to view so that we can hear the background music in context. Stay tuned!
















Leave a Reply