Is it all about FLASH?
March 1st, 2010
Well, unless you have been living under a rock you have probably come across the news of Apple’s latest invention, the iPad. We at Visual Purple think it’s way cool, BTW!!! Back in January when Steve Jobs unveiled the 1.5 pound innovation, many Apple junkies were star struck. Will the 75 million people that have bought into the iPhone and iPod Touch believe in the iPad as well? What could this potentially mean for web developers? Many Apple followers are already saying that they will not buy the iPad simply because it will not support Flash. But for a starting price of $499, what more could you expect? Well you could start by expecting to pay more than the publicized low $499 price tag.
Yes, I will admit back in the day when tablets first hit the market – I was a tablet fanatic. While my awe with them has dissipated a bit the talk of the new iPad coming onto the market brought back some ancient memories. I will acknowledge that I am not an Apple junkie, however I am still intrigued with what the iPad could potentially offer (and not offer). However I am disappointed by the news that they are passing on Flash capability. Adobe claims that Flash is installed on 99 percent of Internet-enabled computers and plays over 75 percent of videos that are viewed online, could this be a transition to the future of the Internet when Flash is no longer supported? What this means to me is that Flash-based 3-D virtual worlds and the future of browser-based virtual worlds cannot function on the iPad (unless you’re using the Unity 3D plug-in). While so many of us virtual world evangelist thought we were close to mainstream adoption another hurdle pops up. Could this potentially be the writing on the wall for Flash? Flash based MMO communities are wildly popular adding to the fact that all 3 major operating systems currently support Flash, I just don’t see how Flash could fall to the wayside.
The fact of the matter is Flash is cool and all, but is it all really that practical. I will even be one of the first ones to admit that we were awed by Flash’s capabilities and recreated our main company website around flash. But that newfangled technology has lost a lot of its glitz and glamour… hey, look the page flies! So my thought is that Flash will not disappear completely, but rather may not be seen on the all alluring Apple iPad (even with potential conversion capabilities in place). Could this be the next game changer, are we really ready to lead a Flash-free existence? What about playing a YouTube video? Can something weighing only 1.5 pounds really cause such a stir? Could this be the tablet that we have waited on for so long or just another step on the ladder to getting a worthy tablet device in the near future? Will the PC market be able to hold up to this – do they have anything under the radar being developed to counter Apple taking center stage?
Chat Bots 101- Artificial Intelligence Optional
July 16th, 2009
I recently enjoyed testing the intelligence of a variety of “Chat Bots” online. While some tend to hold meaningless conversations, others actually make some sense!
Chatterbots, otherwise known as Chat Bots are defined by Wikipedia in the following definition: A chatterbot (or chatbot) is a type of conversational agent, a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods.
Websites are now employing chat-bots to welcome visitors and answer questions, chat bots are able to serve as a virtual assistant. We have also seen some migration of chat bots into virtual world spaces, such as Second Life. So what’s with this rather new form of intelligent technology? Although it’s not necessarily new, in fact the oldest chat bot was recorded back in the 1960’s! Today, they seem to be evolving and intent on becoming a practical solution in a variety of business and pleasure applications. Perhaps ALICE (Artificial Linguistic Internet Computer Entity), is the most famous chat bot of all.
What’s the role of a chat bot in an immersive virtual world? Applying chat bots into virtual world applications have been done- with some success. I believe that in the future we will see chat bots evolve further in virtual world spaces. When you enter Second Life you may immediately have a chat bot befriend you and carry on a meaningful (and intelligent) conversation. Rather than entering an empty area of Second Life with no other form of avatar contact. Chat bot technology is especially useful for companies that have set up shop in Second Life, whereas they don’t have to worry about staffing the Second Life location 24/7 as a virtual chat bot can do the job and answer the basic questions about a company. Because virtual worlds are primarily text based in nature for communication/ interactions this type of environment makes it a great test-bed for chat bot avatars. Utilized for entertainment and information services, Chat Bots are always available (24/7) and intelligent enough to answer questions.
Many are scripted, however a few are non-scripted that pull from a large database of text. Chat-bots are mostly text bound and utilize Natural Language Processing (NLP). Chat bots can range from greeters on web pages (that have been proven to increase sales/ conversion ratios), customer service representatives, tour guides and non-player characters (NPCs). Do they seem lifelike? To some extent yes, but they by no means take on the persona of a real person.
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
Another element to this task is to lengthen or shorten the TTS words to match the blobs of the human model. Figure 5 depicts the effort to make the TTS utterance of ‘…was a…’ (pronounced as though a contraction, ‘whuzza’) line up on the timeline with John’s clip. Use your DAW’s stretch tool to accomplish this.

Figure 5a- First, make your split points

Figure 5b- Next, use a stretch tool
Let’s continue splitting the TTS clip’s timelines so that we can move each corresponding sound blobs to match, and stretch the words right down to the syllable (Figure 6 shows what it looks like when all words have been synced). Listen to the whole joke, both voices lined up properly.

Figure 6
Here’s where some of you are thinking: Well, the blobs are lined up very nicely, but what about nuances regarding stress and pitch? Isn’t the word ‘lawyer’ as expressed by our human friend, John, not being expressed similarly? John’s lawyer blob is larger (i.e., louder) than the TTS blob. Also, isn’t the word ‘seen’ as expressed by John (in this case the stress is caused not by volume but by its pitch being higher, relatively, from the rest of the phrase) not being emulated by the synthetic actor?
Yes, indeed, so let’s try to fix these two issues. We’ll tackle the loudness point first. Figure 7 shows a Volume Envelope (the horizontal blue-ish line running through the center of the TTS clip in the timeline). With most DAWs with this feature, you can bend the volume envelope to cause increases or decreases in the audio.

Figure 7 – Creating break points within the line bends the envelope
Now let’s tackle the pitch issue with that word, ‘seen’. Figure 8 shows the clip properties dialog box specific to the split-off region of our seen-blob. The highlighted value indicates that the word pitch has been raised four half steps.

Figure 8
Listen to the resulting TTS clip with the treatments per ‘lawyer’ and ‘seen’.
Window dressing
Earlier I mentioned that this is a voice for a talking fish. This fish is contained within a fish tank in a hotel bar. Listen to our talking fish enveloped in a bubbling sound effect. Figure 9 shows the TTS clip, sans John’s clip, and with the fish tank noise clip added.

Figure 9 – Note that a volume envelope has been applied to the bubbles as well.
So, is that it, then? Maybe – maybe not. As if we really did want to add some reality to a talking fish environment, we might consider what we know about how a fish tank effects sound. Occlusion happens. There is a glass barrier between the sound emitter (the talking fish) and the sound receiver (the avatar). So, we could elect to shave off some of the high frequencies from our talking fish. We can accomplish this by choosing the appropriate reverb effect. If you have presets at your disposal, start with a bathroom preset or similar. Try placing the reverb effect before any equalization effects (EQ). We use EQ here to bring out the hi-mid frequencies of the voice to ensure that it is intelligible (you may need to reduce high frequencies as well if you choose a reverb preset that sounds too bright). In this case we are also deploying EQ to remove extreme low frequency rumble (artifacts that commonly get accidentally introduced when using filters in the digital domain). Figure 9 shows this idea. Have a listen to the result.

Figure 10a – Software ‘bathroom’ reverb

Figure 10b – Software EQ module
Conclusion
Can synthetic voice-actors make funny? Humor is a very subjective aspect of human emotion. What’s funny to Samuel isn’t so funny to Mary, and so forth. So maybe the jury is still out on that one. To improve our NPC’s delivery, we’ve had to rely on 3rd-party software to ensure that techniques were carefully deployed. Markup language deployment probably won’t be sufficient for specific tasks like this, where real-time interaction is not a requirement. That’s my best guess, anyway.
You may wonder what to do if you have a project that requires an ensemble of funny voices. Well, as long as you have at least one funny human available to you, that person can be your model for all voices. Then your cast of synthetic actors can be molded to conform to your model’s comedic timing.
How about this scenario: you have a cinematic cut-scene where there are several actors in the movie (or trailer). But your budget can only afford one human voice-actor. Consider recording your one voice actor doing the roles of the entire cast. Then, using the techniques discussed above, create an ensemble of TTS voices and synchronize them in your video editor (NLE) to the synthetic voices to the phrasings and expressions of your one human actor.
In fact, maybe we’ll try to tackle an example of that in my next blog entry. Stay tuned!
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
At the end of my previous discussion on NPC Voice-over production, I promised that I would follow up with a blog about what it might take to try to get a synthetic voice to be funny. Remember. We’re talking about NPCs (Non Player Characters), where otherwise playable characters are typically represented by professional voice-talent. I will provide you with samples as we roll along of course, as in tutorial fashion, but with the disclaimer that this is just one approach to this end, as there are likely other useful techniques that could be considered.
Ok, I sense you are protesting, how can a robot out ‘funny’ a performance by a professional voice-talent? I am not at all suggesting that a synthetic voice-actor can win such a contest. But if you are faced with options, and if this is the option you choose, you really want to come up with workable solutions.
What are the resources?
There are a number of synthetic voice vendors available. One obvious task is to choose one. A simple Internet search can help you solve that problem. For purposes of this discussion we will utilize 3rd-party software control mechanisms to effect voice properties. In this tutorial we’ll use a stand-alone audio editor along with a non-linear editor (NLE), but the same task almost certainly can be substituted by a digital audio workstation (DAW) of your choice. The audio editor might be replaced with XML controls if this is your favorite way to effect voice pitch and tempo, etc. However, I think it would be extremely tedious to try to deploy markup languages as a substitute for a DAW. By the end of this writing I bet you will probably agree with me. Please refer to my earlier post, When Your Voice-actor is a Robot, about some detail on resources. And then there is that last very important asset to have. Someone who is funny!
Here at Visual Purple, we are fortunate to have a gentleman who is a very funny guy. And for this experiment it makes for a very lucky day! So, you may be thinking, why are we talking about working with a funny human? Isn’t this topic about having a funny robot? Well, yes is the answer to that — but our funny human (Let’s call him John) will serve as a model for our robot.
Say what?
The short answer is, we will import audio clips of the funny human into our DAW, and then we will import audio clips of the synthetic voice and make it emulate the human’s speech patterns.
Say what?!!
Ok, in this project our goal is to make some humorous fish voices. You see, we have a scene in one our products where someone at a bar can stand and stare at a fish tank. As the fish swim by, and if the avatar is situated close enough to the fish tank, the fish might begin to say wise cracks to the, uh, fish admirer. This is an ‘Easter egg’ where fun is poked at the avatar, possibly insinuating that he has had a bit too much to drink. And to achieve our goal, we need to mode the synthetic voice clip to try to emulate the comedic timing as expressed in the human model.
Let’s do it!
So let’s start the process by importing into our DAW an audio clip that John, the funny human recorded for us (Figure 1).

Figure 1
Next, Listen to John’s original model for reference. The script: “Last week it was a lawyer’s convention. I never seen so many sharks!” We follow that by importing a correlating audio clip from the synthetic voice (Figure 2).

Figure 2
Without doing anything further at this stage, we can easily see that the graphical sound ‘blobs’ don’t match. So, before we move on, have a Listen at the robot’s recording. Notice that this clip has already been treated with pitch transposition. (For a discussion on ways to do that, please refer to my earlier post, When Your Voice-actor is a Robot.) Our intent was to get cartoon-y voices, so we started with a female TTS voice and then modified her pitch characteristics.
Now, to make the robot emulate John’s comedic speech patterns, we need to edit the clip’s timelines so that the graphical sound ‘blobs’ do match. Figure 3 illustrates an example:

Figure 3
In Figure 3’s example we see only the first two words of the script (“Last week…”). Listen to how the TTS’s utterance of the word ‘week’ occurs earlier in the timeline than does John’s blob of the same word. Close — but the timing is just not right is it? Note that we need to create a split point (the vertical line represents this) just before the TTS’ blob. Doing this enables us to separate the words and move them as we wish on the timeline (see Figure 4).

Figure 4
Now, Listen to both voices speak those two words in sync. (…to be continued)
When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 2)
June 11th, 2009
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
The SSML language
If you’re new to markup languages, take a look at this example, as it may be useful as a reference to understand SSML syntax.
< ?xml version="1.0"? >
< speak version="1.0" xml:lang="en-US" >
< voice name="Dave" >
Hello, world; my name is Dave.
< /voice >
< /speak >
This example shows that the voice named “Dave” should pronounce: “Hello, world; my name is Dave.” (Keep in mind that the spaces adjacent to the angel brackets should not be observed in the real-world application) As in other XML-based markup languages, SSML is composed of elements. The root element is
Figure 5 below is a table that shows how the SSML elements are associated to the five points of Text Analysis.

Figure 5
The prosody tag you will use a lot if you intend to create separate voice characters from only one TTS resource. With prosodic control you can manage the tempo and pitch of the voices.
Listen to this XML example of the ‘Grandson’ talk scenario. And see below the markup tag to make it play at a higher pitch.
< prosody pitch="+4.2st" > I believe Visual Purple’s products have among the best where NPC voice quality is concerned. < /prosody >
As far as TTS engines go, this is a pretty effective example. Here, rather than emphasizing one or several individual harmonics as occurs with the wood or metal in music instruments, the vocal tract emphasizes an entire band of harmonics, called formants. Each vowel sound has characteristic bands of higher intensity harmonics. In a word, the character of the original voice clip is largely retained, even when the voice’s pitch has been raised. Beware that not all TTS engines do so well when processed with markup languages.
Listen to an XML example of the ‘Grandson’ talk scenario and see below the markup tag that makes the above paragraph’s sample play at a faster tempo.
< prosody rate="+5%" > I believe Visual Purple’s products have among the best where NPC voice quality is concerned. < /prosody >
Note the glaring sonic artifacts in this example. It plays way too quickly to sound ‘natural’! In my own research I have noticed that many of the TTS engines available do not give the user a fine-control when entering tempo parameters into markup tags. The results are usually too fast or too slow. And in some of those engines that do respond to fine control, sonic artifacts such as static or scratchiness is introduced.
Listen to this XML example of the ‘Grandpa’ talk scenario where we use a markup tag to make it play at a lower pitch.
< prosody pitch="-3.8st" > There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. < /prosody >
It’s interesting to note that this is the same TTS engine that performed so well with the raised pitch formats, but shows some sonic artifacts in this example where the frequencies are pitched lower. Listen carefully and observe the slight scratchiness. It’s as though you can hear tiny, rhythmic interruptions all through the sound data.
Listen to an XML example of the ‘Grandpa’ talk scenario and see below the markup tag that makes the above Grandpa sample play at a slower tempo.
< prosody rate="-10%" > There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. < /prosody >
This one mirrors the same tempo defect revealed in the above Grandson tempo exercise. It plays way too quickly to be useful (unless you are going for that classic Hal-the-computer voice near the end of the movie 2001 where the robot meets his slow demise).
Conclusion
Synthetic speech can make effective voice-actors when techniques are carefully deployed (especially with regard to adjusting tempo and pitch to improve your NPC’s realism). At this juncture there appears to be no one go-to solution. For good results, we’ve had to utilize a combination of 3rd-party software with XML tags, though I have to admit we seem to resort to 3rd-party software more and more.
IF markup language deployment were as robust as we wish they were, we would be able to include an XML parser in our commercially available development tools. We have clients that have expressed an interest in having the capability of building their own virtual world simulations where all they need do is type in their avatars’ text, and the voice syncing-to-animation just happens automatically for them. The bottleneck, though, appears to be a too dramatic hit on frame rate, where the TTS speech and/or animation quality suffers. There is great demand put on a CPU when it has to display high quality images and process real-time audio manipulations simultaneously. This is why, in the meantime, we pre-render scenarios so that our content looks and sounds glorious.
Well, our technologists will figure out a solution to the real-time problem, though. Visual Purple is all about quality – and providing the tools that our customers want!
In a future blog post – Comedic treatment in TTS voices. Can robots be funny? Stay tuned!
When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)
June 10th, 2009
By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.
I’d like to discuss NPC Voice-over production. I will even provide you with downloadable samples as we roll along. In our virtual worlds, Visual Purple sometimes deploys intelligent NPCs (Non Player Characters), where otherwise our playable characters are typically represented by professional voice-talent. Much of the challenge involved is making synthetic voice recordings not sound too synthetic, such as you may think about when you’re on the phone and trapped within one of those automated voice applications. To confront this, some of the tactics we deploy may involve adjusting the tempo and pitch, either to an NPC’s global dialog trait, or just to specific words or phrases. Even some clever combination of both treatments comes into play. One reason to affect tempo and pitch is so that you can get extra mileage from one synthetic voice-actor. A quick for instance: say you desire three male voice-actors for your project…one is a teenager, another one plays the teenager’s father, and that third actor plays the grandfather. By adjusting the pitch of a single synthetic actor you can achieve this. Re-pitch the teenager’s voice a bit higher than ‘dad’s’ (you might just leave dad’s timbre as is), and re-pitch grandpa’s voice a bit lower. Now, in reality, we as individuals likely speak with a different pace (tempo) than an individual in the next cubicle. I submit that we can emulate the same phenomenon in our synthetic actors. We could elect to make the teenager speak with a slightly quicker tempo than dad does (again, we could just leave dad’s pace as is), and slow down grandpa’s tempo somewhat. I’m sure you’re getting the idea. For female timbres simply consider similar treatments.
What and where are the resources?
There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. This is a global market so do not assume that your own language is available only from vendors in the same country as your native tongue. I believe Visual Purple’s products have among the best where NPC voice quality is concerned.
There is a goodly supply of audio tool vendors available as well. Most of the synthetic voice vendors have on-board processing tools. These tools are there to help you arrive at solutions such as the teenager/dad/grandpa scenario depicted above. One common way to utilize their on-board tools (software based) is by developing some markup language skills. XML anyone?
On-board audio-treatment (markup languages)
Control mechanisms to effect voice properties are SSML, SALT, SAPI4, SAPI5, and TTS vendor’s proprietary inventions. Here are a few links if you’d like to study these XML based technologies: http://www.phon.ucl.ac.uk/home/mark/salt/ssml.html, http://www.w3.org/TR/speech-synthesis/, and http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface.
Out-board audio treatment, such as 3rd party software.
Control mechanisms to effect voice properties utilizing 3rd-party software solutions are digital audio workstation (DAW) or non-linear editors (NLE) such as Pro Tools, Sonar, Nuendo (http://www.steinberg.net/en/products/audiopostproduction_product.html), Vegas Pro, and Melodyne, and yes even Audacity among others.
Revisiting the grandpa, dad, and grandson scenario I mentioned earlier, I now want to show you some screen shots and audio examples from results I got when using a 3rd-party tool.

Now listen to grandpa_pitched_low-xml.
Now, to listen to Dad’s pitch texture (pitched normally) click here.
And to listen to Grandson’s pitch texture (raised somewhat) click here.

To further differentiate these actors’ speaking styles, we can also effect their tempo (speed). And we should do this without changing the pitch again. We could give Grandpa a slightly slower tempo, give the Son a quicker tempo, and let’s just leave Dad’s speech pace as is. To listen to Grandpa’s tempo (slowed somewhat) click here.

To listen to Dad’s tempo (kept as is) click here. And to listen to the Son’s tempo (sped up a little) click here.

Get your Technology…
April 30th, 2009
Ever wonder how to use someone else’s technology (legally) and apply it to your own project? Would you like to author an advanced simulation for the cost of traditional CBT or WBT? Now you can with Visual Purple technology! We are excited to introduce our suite of proven, proprietary and patented technologies.
For ten years we have put our blood, sweat and tears into developing business changing technologies that can transform your development processes and produce advanced simulations across three fronts: decision-based, embedded, or virtual world. The Visual Purple simulation suite provides unlimited possibilities. Just contact us for details.
Case Studies in Voice-Over Recording (Part 2)
April 24th, 2009
By Rudy Helm, Quality Assurance Tech, Visual Purple, LLC
With respect to design, the following scenarios could be weighed.
-Is it narrative VO? Or
-Is it dramatic dialog?
If narrative in design, consider trying a condenser microphone, and be consistent with lip distance and use a pop-filter. Ensure that VO talent’s lips are about 2 inches from the pop filter screen, and attach the screen about 6 inches from the and measure your distances with a tape measure, write the measurements down, and enforce those measurements throughout the session. Try to do the session in one sitting (talent shouldn’t eat food while recording). If not possible to do one voice actor in one session, replicate the environment exactly and measure those distances! Realize that if doing the session with the same voice talent over a span of days, barometric pressure and any other environmental variances from one session to the next can work against you.
If the audio design is multi-voice talent drama/comedy, etc. Try using a high-quality lapel type mic in addition to a suspended room mic (the lapel microphones need to be placed equally in distance on each actor, from clothing point to lips). And with each voice actor, to ascertain a proper peak level, test-record the loudest passages of the dialog first. If the character in your drama needs to shout or raise her voice, start with testing those readings.
Let me relay to you an experience that a colleague shared with me. He recorded a character (let’s call her Sue) initially with his team’s mixer set to “where it was left at”. Then with their second character (call him Bob…he was brought in as a replacement for their previous ‘Bob’), the new Bob was clipping because the old Bob was so much louder. “So I had to find some knob to bring it down”, my colleague explained. “Then I had to have him repeat his exclamations at a lower volume to avoid clipping.”
So, the lesson is, always test-record the loudest sequences first. If you can achieve a good recording that doesn’t clip (distort) on the more intense passages, you are assured that all other passages from the same voices talent will not clip. And do not use the same mixing board levels for just any other voice talent. Test-record the loud passages from each separate talent. Sometimes dynamic compression can be deployed. Compression is better used during the session itself (to help stave off clipping issues and to mitigate the ‘proximity effect’, etc) but ultimately better avoided if there are not dramatic differences between normal passages and loud passages within the dialog.
Do not mix the two microphones (lapel plus room) to mono until all talent has been
recorded. Digital recording media is cheap. Make archives. Never overwrite originals! In
this scenario, your mono mix-down is not the original (the multi-track lapel plus room
formats are considered original takes). When mixing the finals to mono, mix such that the overhead microphone is barely noticeable. A good rule of thumb is to set such a level that you almost swear that track is not even ‘on’, yet if you were to mute the overhead track, you realize that you can tell the difference when only the lapel track is playing.
Surface reflections…
Even a desk (including for instance, if a table microphone stand is being used) can bounce the sound around. It’s a good idea to use a soft table pad and place something like a cloth napkin around the microphone pedestal. If the microphone location is suspended, rather than on a table, reflections can be problematic if it’s close enough to a wall (a wall corner, even worse). Another culprit can be simply a small-ish room. Reflections bounce off plaster, glass from mirrors and hanging photos/artwork as well. Oddly, a well insulated isolation booth can still be problematic, simply because it’s a small enclosure, and sound reflects back and forth even before it gets a chance to diffuse in the absorptive or dispersive materials of a small isolation booth.
A Colleague reported that he moved the microphone from in front of his VO talent’s mouth to above it. This technique is referred to as ‘off-axis’ and is a method to avoid direct sound, which can result in a mix of environment with the intended source, depending on the microphone and its configuration.
This may not have turned out to be a method that worked well for him because his team was recording with a big monitor outside the isolation booth. This was a 2-foot tall speaker, reportedly with high volume, sitting about five feet outside of the booth on the mixer desk next to the engineer.
My colleague pointed out that two qualified engineers didn’t have a problem with it, but he was still able to detect issues with the resulting recordings. Myself, I can only respond that if the sound booth provides near 100% isolation, I wouldn’t have a problem with it either. But with only 5 feet of separation, it would have to be one hell of an iso booth…though I guess it’s possible. To which my colleague adds that one can hear a raised voice outside the booth (and you don’t have to yell). Well, I replied to him that it sounds like the monitor speaker would easily approach the dB level of a ‘raised voice’.
Hmmm…To that my colleague agreed, positing that he was pretty sure that if the voice actor wasn’t wearing headphones, which said talent would be able to hear a playback on the speaker.
I think it stands to reason that if the voice talent is in fact able to hear a playback from the monitors, then that means the microphone can also hear the speaker playback. That means it gets mixed back into the microphone (especially if the microphone is off-axis as my colleague reports). The off-axis effect (usually) isn’t as worrisome on a condenser microphone as it is on a dynamic microphone, which typically is configured for unidirectional. On the other hand, condensers are (usually) more sensitive, and therefore keen to pick up external sources.
Let’s conclude with a comment on the end-user’s sound system…
Given the popularity of subwoofers and better-than-multimedia quality speaker systems attached to modern PCs, it probably is a reasonable consideration to roll off frequencies below 100 Hz on certain types of audio, especially VO and ambience loops, lest you are prepared for unnatural representation of those kinds of sounds. Obviously this approach might not be considered for high impact noises, such as car crashes, weapons, explosions, etc, where the subwoofers are best at their job.
By the way, I was happy to report that most of my friend’s sound issues have been resolved with his team’s proactive efforts to avert future problems at the recording level. And they have gone far at making their projects exude that professional sonic ’sheen’.
Wouldn’t it be Cool?! Synthetic Intelligence, that is.
April 20th, 2009
By Ed Heinbockel, President and CEO, Visual Purple, LLC
We have an entertaining habit here at Visual Purple of diving down technology rabbit holes with a particular goal in mind and quickly finding ourselves looking squarely in the eyes of something far bigger, better and badder than anything we initially imagined. If we’re not careful, the proverbial technology tail would soon be wagging us. Looks like we may be living that dream today.
The ‘wouldn’t it be cool’ moment occurred some months back when we came to the realization that VWs demand avatars (NPCs) that behave in ways that would seem realistic and of utility. So we started down the path to achieve not Artificial Intelligence, but Behavioral Intelligence – or what we now choose to describe as Synthetic Intelligence. In future blogs we’ll break this down and explore the ‘why’ behind this naming convention as it implies much.
What about that tech rabbit hole, you ask? Oh yes, the rabbit hole. Think BIG databases. Think ‘smart’ databases. Think what that might do, distributed… got your attention now?
We’ll write more about this after our next patent filing. Stay tuned.
Case Studies in Voice-Over Recording (Part 1)
April 15th, 2009
By Rudy Helm, Quality Assurance Tech, Visual Purple, LLC
I recall once upon a time a colleague described to me that on his PC, he was hearing within the recordings of a project’s Voice-Over talent (supporting an animated character) where she persistently pops her p’s – and my colleague began to wonder if he was crazy. He recognized that if he turned down his subwoofer, he didn’t hear it. So he asked me if that was something that could easily be fixed in their processing? He added that he didn’t hear that kind of problem in any other audio.
I answered to that, “EZ fix?”
Well… If she was the only popper, a couple things may have happened:
-She wasn’t recorded with the same circumstances as other VO talent; or
-She got too close to the microphone sometimes. A pop filter should have been used in all cases (this is a physical device which costs about 25 dollars….hangs on the microphone stand). But in software, Sonic Foundry (now Sony) has a nice preset which minimizes plosives (pops) or sibilance (“ess” sounds). Sometimes using both the physical device and the software filter in tandem can be helpful.
It is also a possibility that the software Equalization settings (EQ) in my colleague’s PC sound system were incorrectly configured for the situation. Many of today’s software drivers include a wide palette of listening-environment presets. Available parameters generally can be further tweaked by the end user. Consider that if a preset is configured to give an extra bass boost, one could notice artifacts in audio within the low-end frequency bands where plosives reside. But I didn’t believe this was the case at hand because my colleague didn’t notice artifacts in any but the recordings of this one specific VO talent.
Anyway, when you’re stuck in a situation where VO can’t be re-recorded, EQ’ing after the fact may be the last resort you have to take to arrive at a solution. In the case of the plosive p’s popping, running the digitized VO clips through a software editor with batch editing EQ capabilities might work out satisfactorily and may even save time.
When ‘fuzziness’ quality in VO can happen:
-when lips are too close to the microphone. This is when the proximity effect is most noticeable. (The proximity effect is that heavy, low frequency characteristic that you get when a microphone records a sound emitter very close to the microphones diaphragm. Record a voice too closely and you may have to use EQ later to thin out the heaviness). Or,
-when a windscreen was utilized, where a pop-filter should have been used instead.
Windscreens also tend to filter away some of the high end. And there is yet another factor that may need to be considered… The choice of microphone.
The proactive solution for any particular recording room is to rent several microphones of various brands and/or diaphragm sizes (condensers, dynamics, and lapels) and test-record VO talent, male and female (same test-script on each voice, each mic). Then choose to use (buy or rent) the microphone(s) that seems work best with available talent.
Even with cases like this, there are shades of gray. Nothing, it seems, is simply black or white. My colleague told me that his team’s recording department reported to him that the mic-to-talent distance was consistent. Well, the caveat about this might be, was the distance the same for all voice talent? If any one specific voice talent was involved, her consistency (regarding distance) would be the only aspect that matter…but managing a larger pool of voice talent is another thing. To be continued later in a Part 2 blog post….














