When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 2)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

The SSML language

If you’re new to markup languages, take a look at this example, as it may be useful as a reference to understand SSML syntax.

< ?xml version="1.0"? >
< speak version="1.0" xml:lang="en-US" >
< voice name="Dave" >
Hello, world; my name is Dave.
< /voice >
< /speak >

This example shows that the voice named “Dave” should pronounce: “Hello, world; my name is Dave.” (Keep in mind that the spaces adjacent to the angel brackets should not be observed in the real-world application) As in other XML-based markup languages, SSML is composed of elements. The root element is and it contains the text to be spoken. The element has two required attributes: xml:lang (the language to be spoken), and version (the version of the specification). There are a few optional attributes as well.

Figure 5 below is a table that shows how the SSML elements are associated to the five points of Text Analysis.

figure 51 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 2)
Figure 5

The prosody tag you will use a lot if you intend to create separate voice characters from only one TTS resource. With prosodic control you can manage the tempo and pitch of the voices.

Listen to this XML example of the ‘Grandson’ talk scenario. And see below the markup tag to make it play at a higher pitch.

< prosody pitch="+4.2st" > I believe Visual Purple’s products have among the best where NPC voice quality is concerned. < /prosody >

As far as TTS engines go, this is a pretty effective example. Here, rather than emphasizing one or several individual harmonics as occurs with the wood or metal in music instruments, the vocal tract emphasizes an entire band of harmonics, called formants. Each vowel sound has characteristic bands of higher intensity harmonics. In a word, the character of the original voice clip is largely retained, even when the voice’s pitch has been raised. Beware that not all TTS engines do so well when processed with markup languages.

Listen to an XML example of the ‘Grandson’ talk scenario and see below the markup tag that makes the above paragraph’s sample play at a faster tempo.

< prosody rate="+5%" > I believe Visual Purple’s products have among the best where NPC voice quality is concerned. < /prosody >

Note the glaring sonic artifacts in this example. It plays way too quickly to sound ‘natural’! In my own research I have noticed that many of the TTS engines available do not give the user a fine-control when entering tempo parameters into markup tags. The results are usually too fast or too slow. And in some of those engines that do respond to fine control, sonic artifacts such as static or scratchiness is introduced.

Listen to this XML example of the ‘Grandpa’ talk scenario where we use a markup tag to make it play at a lower pitch.

< prosody pitch="-3.8st" > There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. < /prosody >

It’s interesting to note that this is the same TTS engine that performed so well with the raised pitch formats, but shows some sonic artifacts in this example where the frequencies are pitched lower. Listen carefully and observe the slight scratchiness. It’s as though you can hear tiny, rhythmic interruptions all through the sound data.

Listen to an XML example of the ‘Grandpa’ talk scenario and see below the markup tag that makes the above Grandpa sample play at a slower tempo.

< prosody rate="-10%" > There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. < /prosody >

This one mirrors the same tempo defect revealed in the above Grandson tempo exercise. It plays way too quickly to be useful (unless you are going for that classic Hal-the-computer voice near the end of the movie 2001 where the robot meets his slow demise).

Conclusion
Synthetic speech can make effective voice-actors when techniques are carefully deployed (especially with regard to adjusting tempo and pitch to improve your NPC’s realism). At this juncture there appears to be no one go-to solution. For good results, we’ve had to utilize a combination of 3rd-party software with XML tags, though I have to admit we seem to resort to 3rd-party software more and more.

IF markup language deployment were as robust as we wish they were, we would be able to include an XML parser in our commercially available development tools. We have clients that have expressed an interest in having the capability of building their own virtual world simulations where all they need do is type in their avatars’ text, and the voice syncing-to-animation just happens automatically for them. The bottleneck, though, appears to be a too dramatic hit on frame rate, where the TTS speech and/or animation quality suffers. There is great demand put on a CPU when it has to display high quality images and process real-time audio manipulations simultaneously. This is why, in the meantime, we pre-render scenarios so that our content looks and sounds glorious.

Well, our technologists will figure out a solution to the real-time problem, though. Visual Purple is all about quality – and providing the tools that our customers want!

In a future blog post – Comedic treatment in TTS voices. Can robots be funny? Stay tuned!

When Your Voice-actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

I’d like to discuss NPC Voice-over production. I will even provide you with downloadable samples as we roll along. In our virtual worlds, Visual Purple sometimes deploys intelligent NPCs (Non Player Characters), where otherwise our playable characters are typically represented by professional voice-talent. Much of the challenge involved is making synthetic voice recordings not sound too synthetic, such as you may think about when you’re on the phone and trapped within one of those automated voice applications. To confront this, some of the tactics we deploy may involve adjusting the tempo and pitch, either to an NPC’s global dialog trait, or just to specific words or phrases. Even some clever combination of both treatments comes into play. One reason to affect tempo and pitch is so that you can get extra mileage from one synthetic voice-actor. A quick for instance: say you desire three male voice-actors for your project…one is a teenager, another one plays the teenager’s father, and that third actor plays the grandfather. By adjusting the pitch of a single synthetic actor you can achieve this. Re-pitch the teenager’s voice a bit higher than ‘dad’s’ (you might just leave dad’s timbre as is), and re-pitch grandpa’s voice a bit lower. Now, in reality, we as individuals likely speak with a different pace (tempo) than an individual in the next cubicle. I submit that we can emulate the same phenomenon in our synthetic actors. We could elect to make the teenager speak with a slightly quicker tempo than dad does (again, we could just leave dad’s pace as is), and slow down grandpa’s tempo somewhat. I’m sure you’re getting the idea. For female timbres simply consider similar treatments.

What and where are the resources?

There are a number of synthetic voice vendors available. It seems though, that many of these vendors are reselling the same voice actors, so try to get your license from the source. This is a global market so do not assume that your own language is available only from vendors in the same country as your native tongue. I believe Visual Purple’s products have among the best where NPC voice quality is concerned.

There is a goodly supply of audio tool vendors available as well. Most of the synthetic voice vendors have on-board processing tools. These tools are there to help you arrive at solutions such as the teenager/dad/grandpa scenario depicted above. One common way to utilize their on-board tools (software based) is by developing some markup language skills. XML anyone?

On-board audio-treatment (markup languages)

Control mechanisms to effect voice properties are SSML, SALT, SAPI4, SAPI5, and TTS vendor’s proprietary inventions. Here are a few links if you’d like to study these XML based technologies: http://www.phon.ucl.ac.uk/home/mark/salt/ssml.html, http://www.w3.org/TR/speech-synthesis/, and http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface.

Out-board audio treatment, such as 3rd party software.

Control mechanisms to effect voice properties utilizing 3rd-party software solutions are digital audio workstation (DAW) or non-linear editors (NLE) such as Pro Tools, Sonar, Nuendo (http://www.steinberg.net/en/products/audiopostproduction_product.html), Vegas Pro, and Melodyne, and yes even Audacity among others.

Revisiting the grandpa, dad, and grandson scenario I mentioned earlier, I now want to show you some screen shots and audio examples from results I got when using a 3rd-party tool.
grandpa 1 300x193 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

Now listen to grandpa_pitched_low-xml.

Now, to listen to Dad’s pitch texture (pitched normally) click here.
And to listen to Grandson’s pitch texture (raised somewhat) click here.
son 1 300x194 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

To further differentiate these actors’ speaking styles, we can also effect their tempo (speed). And we should do this without changing the pitch again. We could give Grandpa a slightly slower tempo, give the Son a quicker tempo, and let’s just leave Dad’s speech pace as is. To listen to Grandpa’s tempo (slowed somewhat) click here.

grandpa 21 300x194 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

To listen to Dad’s tempo (kept as is) click here. And to listen to the Son’s tempo (sped up a little) click here.

son 2 300x195 When Your Voice actor is a Robot (Confronting the NPC Speak Challenge for Virtual Worlds, Part 1)

How Persistent is a Persistent Virtual World?

“Reality is merely an illusion, albeit a very persistent one.” – Albert Einstein

Confusion prevails when it comes to virtual world terminology. So, let’s take a moment to address what a 3D Persistent World really means. Although one cannot easily find a simple go to for the definition of ‘Persistent World’ I have provided a few citations below that track closely how a persistent world should be defined based on our collective experience here at Visual Purple. For a short and sweet example I would state that: A persistent world is a world whose existence continues regardless of whether you are logged in.

I’m always circumspect when it comes to Wikipedia but this citation is in the right ‘area code.’ Definition of a Persistent World from Wikipedia: A persistent world (PW) is a virtual world that continues to exist even after a user exits the world and that user-made changes to its state are, to some extent, permanent. The term is frequently used in the definition of the massively multiplayer online video games and can be considered synonymous with that class of games.

The persistence comes from maintaining and developing the state of the world in the game around the clock. Quite unlike other types of games, the plot and events in a persistent world game continue to develop even while some of the players are not playing their characters. That aspect is similar to the real world where events do occur regardless if they are directly or indirectly related to a person, as they continue to happen while a person is asleep, etc. Conversely, a player’s character can also influence and change a persistent world. The degree to which a character affects a world varies from game to game. Since the game does not pause or create player-accessible back-up files, a character’s actions will have consequences that the player must deal with.

Elements of persistent worlds can be found in computer games from as early as the 1980s, including Trade Wars (1984) and Orb Wars (1989). The term gained popularity in the late 1990s with the growth in popularity of MMORPGs. The term is also frequently used by players of Neverwinter Nights (2002) to refer to MMORPG-like online environments, such as A Land Far Away, Arkaz, Arelith and Bruehawks, created using the game’s toolkit.

Published in July of 2008, The Journal of Virtual Worlds Research- Toward a Definition defined Persistent as: A virtual world cannot be paused. It continues to exist and function after the participant has left. Persistence separates virtual worlds from video games such as Pac-Man or Galega. This persistence changes the way people interact with other participants and the environment. No longer is one participant the center of the world but a member of a dynamic community and evolving economy. A participant has a sense the systems in the space (environment, ecology, economy) exist with or without a participant’s presence.

Not all worlds are created equal…persistent or not!

Speaking of…VIRTUAL WORLDS 1

A weekly wrap up on what’s going on within the Virtual World sphere. Click on any of the below titles to read the full story.

Google to Revive Virtual World Efforts?

Linden Labs Preps for Nebraska

Second Life 1.23 (RC3) Now Available

Learning in 3D Book to be released in January 2010

Small Worlds and Hi5 Combine

Texas State Technical College’s virtual college (vTSTC) in Second Life announces the first known student to graduate from a certificate program taken entirely in a virtual environment

Modeling the True Value Of Social Networks: 2009 Edition

Coinstar Ups Commitment to Games, Virtual Worlds

In-world Sound Gets Upgrade from Cornell Researchers

ThinkBalm Data Garden is live in Second Life!

Social Networking in Virtual Worlds- Part 1

With the emergence and ever-growing popularity of social media I thought it only proper to address social media influences and uses within a virtual world application.
In case you’ve been living under a rock, let me provide you with the Wikipedia definition of Social Media: “Social media is information content created by people using highly accessible and scalable publishing technologies. At its most basic sense, social media is a shift in how people discover, read and share news, information and content. It’s a fusion of sociology and technology, transforming monologue (one to many) into dialog (many to many) and is the democratization of information, transforming people from content readers into publishers. Social media has become extremely popular because it allows people to connect in the online world to form relationships for personal and business. Businesses also refer to social media as user-generated content (UGC) or consumer-generated media (CGM)”.

While the majority of technology users have heard of Facebook, Twitter and MySpace (and most of us are probably guilty of using at least one of these applications, if not all) how will they integrate or influence interactions within a virtual world? 3D chat within a virtual world currently is an option in a handful of VW’s. However, will they become mainstream in the future?

Some Quick Stats:
-Twitter still continues to grow despite constant outages and other issues. In fact Nielsen reported in October 2008 that Twitter is growing as much as 343% in 12 months.
-Facebook is now growing by about 600,000 users each day.
-Blog readership has grown by over 66% in the last year.
-According to ELearning Magazine “Approximately 47% of enterprises are reportedly planning to use social networks for employee, customer and channel communications. What once was considered a productivity drain, has now moved to the mainstream of HR, Training, Sales and Service department operations. Organizations are realizing real business benefits from these new collaboration tools”.

The 3D application of Just Leap In allows you to connect to friends on Facebook. Second Life is looking to embrace social media within its platform. Are there drawbacks? Yes. Well, for one thing if one is training/ learning in a virtual world social media would not be the best device to focus on the content at hand. Most would much rather be updating their status on a social media application. “Here I sit writing a blog entry…” While collaboration should be encouraged is the integration of social networking within virtual worlds for training a must? And if so, how and to what extent?

Visual Purple Recognized for Prestigious Interactive Media Award

Visual Purple Wins Silver & Bronze at the 8th Annual Horizon Interactive Awards Competition

The Horizon Interactive Awards, a leading international interactive media awards competition, announced the 2009 award winners to highlight this year’s “best of the best” in interactive media production.

Visual Purple was recognized for their excellence with a Silver award for the Training Category ICS: 3-Part Training Series.

Visual Purple was also recognized for their excellence with a Bronze award for the Training Category for the Winning in Wireless: Virtual World Simulation.

The eighth annual, international competition saw just over 2000 entries from 32 countries around world including: Australia, Belgium, Brazil, Canada, Czech Republic, China, Croatia, France, Germany, Great Britain, Hong Kong, Hungary, India, Indonesia, Ireland, Israel, Italy, Japan, Malaysia, Martinique, Mexico, Oman, New Zealand, Netherlands, Russia, Spain, Singapore, South Africa, Thailand, Turkey, Taiwan, Turkey, and nearly all 50 of the United States of America. An international panel of judges, consisting of industry professionals with diverse backgrounds, as well as an end user panel evaluated 19 different categories ranging from online advertising to video games. The 2009 winning entries showcase the industry’s best interactive media solutions including web sites, CDs and DVDs, online ads, video and more.

“The 2009 competition was an all new level for the competition,” said Mike Sauce – Founder of the Horizon Interactive Awards. ”The overall quality of entries was far and away the best we have ever had and judging was very competitive. Year after year, we are amazed at the level of creativity and overall technical excellence of the entries that are recognized by the competition. They truly are the best of the best!“

About the Horizon Interactive Awards
In its 8th year, the Horizon Interactive Awards was created to recognize excellence in interactive media production worldwide. Since 2001, the competition has received many thousands of entries from countries around the world and nearly all 50 US States. Each year, those entries are narrowed down to the best of the best to be recognized and promoted on and international stage for their excellence. The judging process involves a Horizon Interactive Awards advisory panel, end user panel and a worldwide panel of judges consisting of industry professionals. Winning entries are dubbed the “best of the best” in the interactive media industry.

LouiseBrooks theme byThemocracy

SEO Powered by Platinum SEO from Techblissonline