When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
I’d like to discuss NPC Voice-over production. I will even provide you with downloadable samples as we roll along. In our virtual worlds, Visual Purple sometimes deploys intelligent NPCs (Non Player Characters), where otherwise our playable characters are typically represented by professional voice-talent. Much of the challenge involved is making synthetic voice recordings not sound too synthetic, such as you may think about when you’re on the phone and trapped within one of those automated voice applications. To confront this, some of the tactics we deploy may involve adjusting the tempo and pitch, either to an NPC’s global dialog trait, or just to specific words or phrases. Even some clever combination of both treatments comes into play. One reason to affect tempo and pitch is so that you can get extra mileage from one synthetic voice-actor. A quick for instance: say you desire three male voice-actors for your project…one is a teenager, another one plays the teenager’s father, and that third actor plays the grandfather. By adjusting the pitch of a single synthetic actor you can achieve this. Re-pitch the teenager’s voice a bit higher than ‘dad’s’ (you might just leave dad’s timbre as is), and re-pitch grandpa’s voice a bit lower. Now, in reality, we as individuals likely speak with a different pace (tempo) than an individual in the next cubicle. I submit that we can emulate the same phenomenon in our synthetic actors. We could elect to make the teenager speak with a slightly quicker tempo than dad does (again, we could just leave dad’s pace as is), and slow down grandpa’s tempo somewhat. I’m sure you’re getting the idea. For female timbres simply consider similar treatments.
What and where are the resources?
There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. This is a global market so do not assume that your own language is available only from vendors in the same country as your native tongue. I believe Visual Purple’s products have among the best where NPC voice quality is concerned.
There is a goodly supply of audio tool vendors available as well. Most of the synthetic voice vendors have on-board processing tools. These tools are there to help you arrive at solutions such as the teenager/dad/grandpa scenario depicted above. One common way to utilize their on-board tools (software based) is by developing some markup language skills. XML anyone?
On-board audio-treatment (markup languages)
Control mechanisms to effect voice properties are SSML, SALT, SAPI4, SAPI5, and TTS vendor’s proprietary inventions. Here are a few links if you’d like to study these XML based technologies: http://www.phon.ucl.ac.uk/home/mark/salt/ssml.html, http://www.w3.org/TR/speech-synthesis/, and http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface.
Out-board audio treatment, such as 3rd party software.
Control mechanisms to effect voice properties utilizing 3rd-party software solutions are digital audio workstation (DAW) or non-linear editors (NLE) such as Pro Tools, Sonar, Nuendo (http://www.steinberg.net/en/products/audiopostproduction_product.html), Vegas Pro, and Melodyne, and yes even Audacity among others.
Revisiting the grandpa, dad, and grandson scenario I mentioned earlier, I now want to show you some screen shots and audio examples from results I got when using a 3rd-party tool.

Now listen to grandpa_pitched_low-xml.
Now, to listen to Dad’s pitch texture (pitched normally) click here.
And to listen to Grandson’s pitch texture (raised somewhat) click here.

To further differentiate these actors’ speaking styles, we can also effect their tempo (speed). And we should do this without changing the pitch again. We could give Grandpa a slightly slower tempo, give the Son a quicker tempo, and let’s just leave Dad’s speech pace as is. To listen to Grandpa’s tempo (slowed somewhat) click here.

To listen to Dad’s tempo (kept as is) click here. And to listen to the Son’s tempo (sped up a little) click here.














Rudy,
Great post and very helpful to those new to Text-to-Speech (TTS). My company is tackling the lack of voice variety by creating dozens of fun & entertaining character voices. Over 50 voices are available from Cepstral’s hosted service at: http://www.VoiceForge.com. There’s a free demo.
Thanks,
-Craig