Virtual Speak

Advanced Simulation Technologies & Embedded Training Systems

Virtual Speak

Advanced Simulation Technologies & Embedded Training Systems

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

At the end of my previous discussion on NPC Voice-over production, I promised that I would follow up with a blog about what it might take to try to get a synthetic voice to be funny. Remember. We’re talking about NPCs (Non Player Characters), where otherwise playable characters are typically represented by professional voice-talent. I will provide you with samples as we roll along of course, as in tutorial fashion, but with the disclaimer that this is just one approach to this end, as there are likely other useful techniques that could be considered.

Ok, I sense you are protesting, how can a robot out ‘funny’ a performance by a professional voice-talent? I am not at all suggesting that a synthetic voice-actor can win such a contest. But if you are faced with options, and if this is the option you choose, you really want to come up with workable solutions.

What are the resources?

There are a number of synthetic voice vendors available. One obvious task is to choose one. A simple Internet search can help you solve that problem. For purposes of this discussion we will utilize 3rd-party software control mechanisms to effect voice properties. In this tutorial we’ll use a stand-alone audio editor along with a non-linear editor (NLE), but the same task almost certainly can be substituted by a digital audio workstation (DAW) of your choice. The audio editor might be replaced with XML controls if this is your favorite way to effect voice pitch and tempo, etc. However, I think it would be extremely tedious to try to deploy markup languages as a substitute for a DAW. By the end of this writing I bet you will probably agree with me. Please refer to my earlier post, When Your Voice-actor is a Robot, about some detail on resources. And then there is that last very important asset to have. Someone who is funny!

Here at Visual Purple, we are fortunate to have a gentleman who is a very funny guy. And for this experiment it makes for a very lucky day! So, you may be thinking, why are we talking about working with a funny human? Isn’t this topic about having a funny robot? Well, yes is the answer to that — but our funny human (Let’s call him John) will serve as a model for our robot.

Say what?

The short answer is, we will import audio clips of the funny human into our DAW, and then we will import audio clips of the synthetic voice and make it emulate the human’s speech patterns.

Say what?!!
Ok, in this project our goal is to make some humorous fish voices. You see, we have a scene in one our products where someone at a bar can stand and stare at a fish tank. As the fish swim by, and if the avatar is situated close enough to the fish tank, the fish might begin to say wise cracks to the, uh, fish admirer. This is an ‘Easter egg’ where fun is poked at the avatar, possibly insinuating that he has had a bit too much to drink. And to achieve our goal, we need to mode the synthetic voice clip to try to emulate the comedic timing as expressed in the human model.

Let’s do it!
So let’s start the process by importing into our DAW an audio clip that John, the funny human recorded for us (Figure 1).

Figure 1
Figure 1

Next, Listen to John’s original model for reference. The script: “Last week it was a lawyer’s convention. I never seen so many sharks!” We follow that by importing a correlating audio clip from the synthetic voice (Figure 2).

Figure 2
Figure 2

Without doing anything further at this stage, we can easily see that the graphical sound ‘blobs’ don’t match. So, before we move on, have a Listen at the robot’s recording. Notice that this clip has already been treated with pitch transposition. (For a discussion on ways to do that, please refer to my earlier post, When Your Voice-actor is a Robot.) Our intent was to get cartoon-y voices, so we started with a female TTS voice and then modified her pitch characteristics.

Now, to make the robot emulate John’s comedic speech patterns, we need to edit the clip’s timelines so that the graphical sound ‘blobs’ do match. Figure 3 illustrates an example:

Figure 3
Figure 3

In Figure 3’s example we see only the first two words of the script (“Last week…”). Listen to how the TTS’s utterance of the word ‘week’ occurs earlier in the timeline than does John’s blob of the same word. Close — but the timing is just not right is it? Note that we need to create a split point (the vertical line represents this) just before the TTS’ blob. Doing this enables us to separate the words and move them as we wish on the timeline (see Figure 4).

Figure 4
Figure 4

Now, Listen to both voices speak those two words in sync. (…to be continued)

One Response to “Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)”

  1. Craig Campbell Says:

    Another great educational post. Please let your readers know that there are “Funny” TTS voices available at VoiceForge. For instance, we have an Italian-American voice called Wiseguy that would be perfect for this joke – http://www.VoiceForge.com/demo

Trackbacks/Pingbacks

Leave a Reply

Spam Protection by WP-SpamFree