When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

I’d like to discuss NPC Voice-over production. I will even provide you with downloadable samples as we roll along. In our virtual worlds, Visual Purple sometimes deploys intelligent NPCs (Non Player Characters), where otherwise our playable characters are typically represented by professional voice-talent. Much of the challenge involved is making synthetic voice recordings not sound too synthetic, such as you may think about when you’re on the phone and trapped within one of those automated voice applications. To confront this, some of the tactics we deploy may involve adjusting the tempo and pitch, either to an NPC’s global dialog trait, or just to specific words or phrases. Even some clever combination of both treatments comes into play. One reason to affect tempo and pitch is so that you can get extra mileage from one synthetic voice-actor. A quick for instance: say you desire three male voice-actors for your project…one is a teenager, another one plays the teenager’s father, and that third actor plays the grandfather. By adjusting the pitch of a single synthetic actor you can achieve this. Re-pitch the teenager’s voice a bit higher than ‘dad’s’ (you might just leave dad’s timbre as is), and re-pitch grandpa’s voice a bit lower. Now, in reality, we as individuals likely speak with a different pace (tempo) than an individual in the next cubicle. I submit that we can emulate the same phenomenon in our synthetic actors. We could elect to make the teenager speak with a slightly quicker tempo than dad does (again, we could just leave dad’s pace as is), and slow down grandpa’s tempo somewhat. I’m sure you’re getting the idea. For female timbres simply consider similar treatments.

What and where are the resources?

There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. This is a global market so do not assume that your own language is available only from vendors in the same country as your native tongue. I believe Visual Purple’s products have among the best where NPC voice quality is concerned.

There is a goodly supply of audio tool vendors available as well. Most of the synthetic voice vendors have on-board processing tools. These tools are there to help you arrive at solutions such as the teenager/dad/grandpa scenario depicted above. One common way to utilize their on-board tools (software based) is by developing some markup language skills. XML anyone?

On-board audio-treatment (markup languages)

Control mechanisms to effect voice properties are SSML, SALT, SAPI4, SAPI5, and TTS vendor’s proprietary inventions. Here are a few links if you’d like to study these XML based technologies: http://www.phon.ucl.ac.uk/home/mark/salt/ssml.html, http://www.w3.org/TR/speech-synthesis/, and http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface.

Out-board audio treatment, such as 3rd party software.

Control mechanisms to effect voice properties utilizing 3rd-party software solutions are digital audio workstation (DAW) or non-linear editors (NLE) such as Pro Tools, Sonar, Nuendo (http://www.steinberg.net/en/products/audiopostproduction_product.html), Vegas Pro, and Melodyne, and yes even Audacity among others.

Revisiting the grandpa, dad, and grandson scenario I mentioned earlier, I now want to show you some screen shots and audio examples from results I got when using a 3rd-party tool.
grandpa 1 300x193 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

Now listen to grandpa_pitched_low-xml.

Now, to listen to Dad’s pitch texture (pitched normally) click here.
And to listen to Grandson’s pitch texture (raised somewhat) click here.
son 1 300x194 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

To further differentiate these actors’ speaking styles, we can also effect their tempo (speed). And we should do this without changing the pitch again. We could give Grandpa a slightly slower tempo, give the Son a quicker tempo, and let’s just leave Dad’s speech pace as is. To listen to Grandpa’s tempo (slowed somewhat) click here.

grandpa 21 300x194 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

To listen to Dad’s tempo (kept as is) click here. And to listen to the Son’s tempo (sped up a little) click here.

son 2 300x195 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

Case Studies in Voice-Over Recording (Part 2)

By Rudy Helm, Quality Assurance Tech, Visual Purple, LLC

With respect to design, the following scenarios could be weighed.
-Is it narrative VO? Or
-Is it dramatic dialog?

If narrative in design, consider trying a condenser microphone, and be consistent with lip distance and use a pop-filter. Ensure that VO talent’s lips are about 2 inches from the pop filter screen, and attach the screen about 6 inches from the and measure your distances with a tape measure, write the measurements down, and enforce those measurements throughout the session. Try to do the session in one sitting (talent shouldn’t eat food while recording). If not possible to do one voice actor in one session, replicate the environment exactly and measure those distances! Realize that if doing the session with the same voice talent over a span of days, barometric pressure and any other environmental variances from one session to the next can work against you.

If the audio design is multi-voice talent drama/comedy, etc. Try using a high-quality lapel type mic in addition to a suspended room mic (the lapel microphones need to be placed equally in distance on each actor, from clothing point to lips). And with each voice actor, to ascertain a proper peak level, test-record the loudest passages of the dialog first. If the character in your drama needs to shout or raise her voice, start with testing those readings.

Let me relay to you an experience that a colleague shared with me. He recorded a character (let’s call her Sue) initially with his team’s mixer set to “where it was left at”. Then with their second character (call him Bob…he was brought in as a replacement for their previous ‘Bob’), the new Bob was clipping because the old Bob was so much louder. “So I had to find some knob to bring it down”, my colleague explained. “Then I had to have him repeat his exclamations at a lower volume to avoid clipping.”

So, the lesson is, always test-record the loudest sequences first. If you can achieve a good recording that doesn’t clip (distort) on the more intense passages, you are assured that all other passages from the same voices talent will not clip. And do not use the same mixing board levels for just any other voice talent. Test-record the loud passages from each separate talent. Sometimes dynamic compression can be deployed. Compression is better used during the session itself (to help stave off clipping issues and to mitigate the ‘proximity effect’, etc) but ultimately better avoided if there are not dramatic differences between normal passages and loud passages within the dialog.

Do not mix the two microphones (lapel plus room) to mono until all talent has been
recorded. Digital recording media is cheap. Make archives. Never overwrite originals! In
this scenario, your mono mix-down is not the original (the multi-track lapel plus room
formats are considered original takes). When mixing the finals to mono, mix such that the overhead microphone is barely noticeable. A good rule of thumb is to set such a level that you almost swear that track is not even ‘on’, yet if you were to mute the overhead track, you realize that you can tell the difference when only the lapel track is playing.

Surface reflections…
Even a desk (including for instance, if a table microphone stand is being used) can bounce the sound around. It’s a good idea to use a soft table pad and place something like a cloth napkin around the microphone pedestal. If the microphone location is suspended, rather than on a table, reflections can be problematic if it’s close enough to a wall (a wall corner, even worse). Another culprit can be simply a small-ish room. Reflections bounce off plaster, glass from mirrors and hanging photos/artwork as well. Oddly, a well insulated isolation booth can still be problematic, simply because it’s a small enclosure, and sound reflects back and forth even before it gets a chance to diffuse in the absorptive or dispersive materials of a small isolation booth.

A Colleague reported that he moved the microphone from in front of his VO talent’s mouth to above it. This technique is referred to as ‘off-axis’ and is a method to avoid direct sound, which can result in a mix of environment with the intended source, depending on the microphone and its configuration.

This may not have turned out to be a method that worked well for him because his team was recording with a big monitor outside the isolation booth. This was a 2-foot tall speaker, reportedly with high volume, sitting about five feet outside of the booth on the mixer desk next to the engineer.

My colleague pointed out that two qualified engineers didn’t have a problem with it, but he was still able to detect issues with the resulting recordings. Myself, I can only respond that if the sound booth provides near 100% isolation, I wouldn’t have a problem with it either. But with only 5 feet of separation, it would have to be one hell of an iso booth…though I guess it’s possible. To which my colleague adds that one can hear a raised voice outside the booth (and you don’t have to yell). Well, I replied to him that it sounds like the monitor speaker would easily approach the dB level of a ‘raised voice’.
Hmmm…To that my colleague agreed, positing that he was pretty sure that if the voice actor wasn’t wearing headphones, which said talent would be able to hear a playback on the speaker.

I think it stands to reason that if the voice talent is in fact able to hear a playback from the monitors, then that means the microphone can also hear the speaker playback. That means it gets mixed back into the microphone (especially if the microphone is off-axis as my colleague reports). The off-axis effect (usually) isn’t as worrisome on a condenser microphone as it is on a dynamic microphone, which typically is configured for unidirectional. On the other hand, condensers are (usually) more sensitive, and therefore keen to pick up external sources.

Let’s conclude with a comment on the end-user’s sound system…
Given the popularity of subwoofers and better-than-multimedia quality speaker systems attached to modern PCs, it probably is a reasonable consideration to roll off frequencies below 100 Hz on certain types of audio, especially VO and ambience loops, lest you are prepared for unnatural representation of those kinds of sounds. Obviously this approach might not be considered for high impact noises, such as car crashes, weapons, explosions, etc, where the subwoofers are best at their job.

By the way, I was happy to report that most of my friend’s sound issues have been resolved with his team’s proactive efforts to avert future problems at the recording level. And they have gone far at making their projects exude that professional sonic ‘sheen’.

Case Studies in Voice-Over Recording (Part 1)

By Rudy Helm, Quality Assurance Tech, Visual Purple, LLC

I recall once upon a time a colleague described to me that on his PC, he was hearing within the recordings of a project’s Voice-Over talent (supporting an animated character) where she persistently pops her p’s – and my colleague began to wonder if he was crazy. He recognized that if he turned down his subwoofer, he didn’t hear it. So he asked me if that was something that could easily be fixed in their processing? He added that he didn’t hear that kind of problem in any other audio.

I answered to that, “EZ fix?”
Well… If she was the only popper, a couple things may have happened:
-She wasn’t recorded with the same circumstances as other VO talent; or
-She got too close to the microphone sometimes. A pop filter should have been used in all cases (this is a physical device which costs about 25 dollars….hangs on the microphone stand). But in software, Sonic Foundry (now Sony) has a nice preset which minimizes plosives (pops) or sibilance (“ess” sounds). Sometimes using both the physical device and the software filter in tandem can be helpful.

It is also a possibility that the software Equalization settings (EQ) in my colleague’s PC sound system were incorrectly configured for the situation. Many of today’s software drivers include a wide palette of listening-environment presets. Available parameters generally can be further tweaked by the end user. Consider that if a preset is configured to give an extra bass boost, one could notice artifacts in audio within the low-end frequency bands where plosives reside. But I didn’t believe this was the case at hand because my colleague didn’t notice artifacts in any but the recordings of this one specific VO talent.

Anyway, when you’re stuck in a situation where VO can’t be re-recorded, EQ’ing after the fact may be the last resort you have to take to arrive at a solution. In the case of the plosive p’s popping, running the digitized VO clips through a software editor with batch editing EQ capabilities might work out satisfactorily and may even save time.

When ‘fuzziness’ quality in VO can happen:
-when lips are too close to the microphone. This is when the proximity effect is most noticeable. (The proximity effect is that heavy, low frequency characteristic that you get when a microphone records a sound emitter very close to the microphones diaphragm. Record a voice too closely and you may have to use EQ later to thin out the heaviness). Or,
-when a windscreen was utilized, where a pop-filter should have been used instead.
Windscreens also tend to filter away some of the high end. And there is yet another factor that may need to be considered… The choice of microphone.

The proactive solution for any particular recording room is to rent several microphones of various brands and/or diaphragm sizes (condensers, dynamics, and lapels) and test-record VO talent, male and female (same test-script on each voice, each mic). Then choose to use (buy or rent) the microphone(s) that seems work best with available talent.

Even with cases like this, there are shades of gray. Nothing, it seems, is simply black or white. My colleague told me that his team’s recording department reported to him that the mic-to-talent distance was consistent. Well, the caveat about this might be, was the distance the same for all voice talent? If any one specific voice talent was involved, her consistency (regarding distance) would be the only aspect that matter…but managing a larger pool of voice talent is another thing. To be continued later in a Part 2 blog post….

My Argument: What Foley *Really* Is (As a Subset of Sound Effect Lexicon)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

I thought I’d open a discussion about the term ‘Foley.’ I’ve noticed the use of the term bandied about and, methinks, inadvertently misused by some. Truth is, a foley is a sound effect, yet not just any sound effect is a foley.

A Foley artist (named after Jack Foley, a film-sound pioneer) on a multimedia, television or film crew is responsible for creating a goodly share of the sound effects in a project. The typical notion is that foleys are recorded, real-time, in a session with a recording engineer. A foley artist has a specialized role, crucial to producing a successful soundtrack. (Note that Foley artists, sound designers, editors, etc, are not necessarily one and the same function.)

Sound effects and Foley are applied at post-production to voice-over, dialog and real effects which were recorded by microphones on set. There are times, as when in the case of animated features, there’s no sound to begin with and all sound needs to be fabricated by the Foley artist and sound designer. The Foley artist might try to bolster existing sounds to make them more immersive, or bigger-than-life. Below can be seen a list of sound effects apparatuses. Often commonly found materials are utilized. Many Foley artists are proud of their own sound effects inventions and tactics.

Some sound effects apparatuses to give an idea of how Foley effects can be made:
Apparatuses = Intended Effect
Clacking empty coconut shells together=Horses galloping or trotting
Crumbling up audio tape =Leaves rustling
Flapping a pair of gloves =Bird wings flapping
Breaking bamboo or celery stalk = Bones crunching
Thump watermelon = Body punch
Kiss back of hand = Kissing
Walks in high heels on wood = Foot steps – high heels
Squeeze box of corn starch =Foot steps – snow
Slide paper from envelope = Sliding doors, ala Star Trek

Note that this is an ‘art’ context and not a collection in a library. It’s about illusion. Conversely, recording an actual jet engine is not making a Foley. Because you simply recorded a real jet does not mean that you have engaged in the art of foley-ism. But faking a jet engine sound by recording some other event and instead, simulating the jet, you have made a Foley.

Sound Effect is generally described as a superset to many different sound disciplines. A Hard sound effect is a common sound such as door slams, weapons firing, and vehicle drive-bys. Whenever the real thing is recorded and utilized, this is not a foley element.

A Background (BG) sound effect is an ambient or atmosphere sound such as forest sounds, or people ‘walla’. Again, record and use the real thing and we’re still not talkin’ foley.

A Foley sound effect is a sound that is synchronized on screen. Footsteps, the rustling of cloth, the deployment of hand props are typical Foley practices. In the film world, sound effect categories are specialized. Sound editors bear titles known by their specialties, such as “Car cutter”, “Guns cutter”, etc.). Understanding the process can be digested into two parts: the recording of the effects, and the processing of effects. Commercial sound effects libraries are available to content producers. On large-scale projects sound effects may be custom-recorded for originality.

None of the above sound industry terms are of my own invention. And there are no ‘guidelines and recommended practices’ published (that I am personally aware of), so anyone can quite freely call me a ‘quack.’ Thankfully, there are other places on the Internet to find similar discussions in this topic. Among them being:

The Art of Foley
Sound Effect
Film Crew
Foley Artist
Film Production

I have been a many-year subscriber to Mix Magazine, EQ Magazine and Electronic Musician, former co-chairman of the Interactive Audio Special Interest Group (under the umbrella of the MIDI Manufacturer’s Assoc.), and a professional musician since for way too many years. Now back to my coconut shells…

LouiseBrooks theme byThemocracy

SEO Powered by Platinum SEO from Techblissonline