Posts tagged: TTS

Emulating Human Voice-overs with TTS Voices, Part Three

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

Continuing with our vlog ‘how-to’ series called, ‘Emulating Human Voice-overs with TTS Voices’ we now offer this newer presentation, suffixed as ‘Part Three’. We recommend that you review Part One and Part Two first but that is not necessarily a requirement. For this exercise we snipped out a small piece from one of our past projects. Unlike the premise of Part 2, where you learned to sync a TTS voice to a human voice-actor, this video tutorial will focus on the tactics of humanizing synthetic voice-clips with added detail. Today’s presentation does not only reinforce the techniques discussed in Part 1 and Part 2, but will show how to set the talk-pace to improve the phrasings and expressions of synthetic voices. And the concept of formant manipulation is introduced as well. Disclaimer: these are helpful tips, but generalized. Not all editing tools or TTS engines respond to specific techniques that you might try in the very same way. Mainly, just try to grasp the concepts, then adapt your technique to idiosyncrasies of your chosen tools.

The How-To-Humanize your TTS Clips exhibit (Exhibit Part 3). A follow-up on VO elements originally presented in Part 2’s vlog. This time, we introduce Formant handling.

As always check back in for more on this topic and other fun and useful information!

Emulating Human Voice-overs with TTS Voices, Part Two

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

Since there was great interest in a blog entry last Fall called, ‘Emulating Human Voice-overs with TTS Voices’ I have elected to present those lessons as a Vlog, and so it makes sense that we give this newer presentation the same title, but suffixed with ‘Part Two’. We recommend that you review Part One’s scenario before you proceed (to do so, click here) but doing so doesn’t have to be a requirement. For this exercise we snipped out a small piece from Part One’s cut-scene where there were several actors in the cast, but there is only one actor dealt with in this clip. Recall that the premise is that your project’s budget can afford only one human voice-actor. So, you’ve recorded your one human voice actor doing each role of the entire cast. This video tutorial will show the techniques discussed in Part One. Learn how to sync synthetic voices to the phrasings and expressions of your human model. Disclaimer: these are helpful tips, but generalized. Not all editing tools or TTS engines respond to specific techniques that you might try in the very same way. Mainly, just try to grasp the concepts, then adapt your technique to idiosyncrasies of your chosen tools.

Sound effects were mentioned in Part One, but that discussion will need to wait for a future vlog. Music was mentioned also, but we cover music in other vlogs, so be sure to look for those as well.

(Ex. Part 2) The How-To-Create synchronize TTS to Human Model exhibit. A Vlog on how we developed the VO elements originally presented in Part 1.

As always check back in for more on this topic and other fun and useful information!

When Your Musician is a Robot, Part 4 (Can musical assets be free in a Virtual World?)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

This is to fulfill my promise to describe how we were able to develop that functional, copyright free, royalty free, original music that we showcased within the video that we presented in Part 3. Click here for a review of that video, as it is essentially prerequisite viewing to get the most from this article’s tutorial. Yes, this video-blog contains the bona fide instructional ‘how-to’. That is — a fun, informational video show-and-tell regarding the background music-making tool and processes involved with the production in the Part 3 exhibit.

Recall that Exhibit Part 3, embodied a variety of musical styles. In this latest video, Exhibit Part 4, animated avatars will act as both your tour guide and mentors. Also remember that we had promised to discuss the usage of the music tool, with emphasis on reinforcing ‘the 1-4-5 principle’ (we initially introduced that here). But today…well, with this video you can see that principle in action.

We think you will enjoy this. Please don’t hesitate to send us feedback!

httpv://www.youtube.com/watch?v=-UTjeCRDSb8

The How-To-Create BGM exhibit (Exhibit Part 4). Learn how we developed the musical elements originally presented in Exhibit Part 3.

When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

How does one tame a virtual musician? For a discussion on the UI, let’s go step by step with the process I underwent to generate a music bed. What follows is the style palette. (Other competing software tools may not look like these screenshots but will offer similar functions.) In one scenario, I chose one of the very many available country styles, but one that includes pedal steel guitar, as in Figure 1.

Figure1 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 1 – The Country music selection from the musical styles palette window. Note that there are many sub-styles to choose from.

You can set the key (Figure 2). If you don’t care what key, leave this alone, your music will default to the key of ‘C’ (I didn’t; and mine did the default – for both styles). You can also set the tempo (Figure 3). This is a trial-and-error kind of thing. Experiment until it feels right for your purposes. If you don’t set the speed, it will default to 120 BPM (beats per minute…think Sousa March).

Figure2 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 2 – Key selection menu.

Figure3 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 3 –Tempo selection dialog box.

Figure 4. The interface is like a spreadsheet. Each cell entry represents which of the (1,4,5) chords will fall in the timeline. The first cell defaults to ‘1’ (in this case ‘C’), so you don’t need to enter any values yet.
Figure4 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 4 – Cell one defaults to ‘C’ chord

Figure 5. But move over to the next cell (‘bar 2’ in musical lingo) and enter the number 4. It automatically knows which proper chord to enter within that key (in this case, the ‘F’ chord).
Figure51 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 5 –Enter ‘4’; the ‘F’ chord appears

Figure 6. Move to the next cell and let’s enter the number we haven’t used yet, ‘5’. Let’s leave cell four empty.
Figure6 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 6 – Enter the digit ‘5’ into cell three

Figure 7. Now, at cell three, you will see that the tool has automatically assumed the ‘G’ chord for you.
Figure7 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 7 – The ‘G’ chord appears

What you have then is 1 bar of C, one bar of F and two bars of G. To finish preparing the body of your new music bed, highlight and copy the upper row of cells you instantiated, as in the following Figure 8.
Figure8 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 8 – Copy four bars of music (cells one through four)

The next step is to paste those 4 copied bars into three more rows of cells. Now you end up with a 16 bar loop, as in Figure 9.
Figure9 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 9 – Four bars pasted three times results in sixteen bars

Figure 10. Enter 16 bars (16 cells) to define the start and end of your loop.
Figure10 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 10 – Enter 16 to indicate which cell is the end

Figure 11. Choose how many repeats for your loop. How many times your music bed should loop-play depends on how long you need it to play. If the music engine in your project will repeat the loop as many time as you need, set the loop count to ‘1’. If not, set it to the number of loops that will fill the time required.
Figure11 When Your Musician is a Robot (Can Automated Minstrels Play Nice in a Virtual World?), Part 2
Figure 11 – Click the loop button and select repeats from a pull-down menu.

Conclusion
I commonly say that whenever you can afford real musicians for crucial sonic moments such as main themes, hire them. But when budget cries Mary, maybe try some of the things I have offered in these blogs about synthetic music production, especially for BGM.

Let’s review the positive points — copyright free, royalty free, original music…that can be created by anyone on your team (with the help of your synthetic musician, of course). In our next blog we will cover a few more fascinating creations from our virtual composer, so stay tuned! And by the way, if you would like some consultation or some help developing your project please don’t hesitate to contact us.

Emulating Human Voice-overs with TTS Voices

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

This is a follow-up that I promised in the last paragraph of my last blog entry, ‘Can Robots be Funny?’ This blog entry will not only perform the follow-up but will also segue nicely into my new topic, a topic that shall wait until the conclusion of this blog to introduce. Anyway, I had proposed this scenario: say we have a cinematic cut-scene where there are several actors in the scene (or trailer), but let’s say our budget can maybe only afford one human voice-actor. So, you’ve recorded your one human voice actor doing each role of the entire cast. Next, we use the techniques discussed in my aforementioned article to create an ensemble of TTS actors and sync their synthetic voices to the phrasings and expressions of your human model. Not only that, we will deploy only 1 TTS male voice and 1 TTS female voice to cover our entire casting of 7 characters (that’s 3 adult males, 2 adult females and 2 children)! These tasks we’ll accomplish with a suite of video and audio editing software (markup languages not being practical for trailers, etc).

First, listen to this excerpt where you will hear the human model’s VO.

Now listen to the same animation again. This time however, the audio you hear is our TTS VO.

Recall that from my earlier blogs, the tasks included lengthening or shortening the TTS vowels and syllables to match those of the human model. See Figure 1.
Figure 31 300x79 Emulating Human Voice overs with TTS Voices

Figure 2- Next, view the whole enchilada, as it were, this time with the entire cast of male and female TTS voice-actors doing their stuff, including a crowd sound effect for the background restaurant ambience:
“- Link to”
YouTube Visual Purple

Figure 3-Window dressing

Sound effects! Well, if you missed my previous blog (remember the talking fish?), you should visit it to read a discussion on sound effects. The background sound effects help the ambience place the scene in a more immersive environment, doesn’t it? And then there’s…

MUSIC! That’s right! Here’s where I depart for now from my series of blogs on TTS voice-actors (‘bout time isn’t it?). Background music can be an additional sweetening, adding greatly to your scene. “But wait!” you may be saying, “I thought the focus of this exercise was our tight budget!?”

You are so correct. How about copyright free, royalty free music? I’ll explore that a bit more later, but now let’s have a look and listen to this scene, but rendered with (royalty free) background music. This is a restaurant scene, and many restaurants provide music for their clientele, right?

Figure 4– Complete scene rendered with TTS voices and background music
“- Link to”
YouTube Visual Purple

Now see how effective the music was in providing a more pleasing restaurant ambience? In fact, without the music, the scene was rather sterile, wasn’t it? Even with the ambient crowd chatter in the background it was stale, but the music made the scene, well, …right!

Conclusion

Whenever you can afford real actors, do it! But when budget screams for relief, maybe try some of the things I have offered in these blogs about synthetic VO.
And algorithmic music? Yes. Algorithmic. Why not opt for copyright free, royalty free music since we have the opportunity and whenever it’s appropriate to our project, not to mention our budget. In fact, that will be the topic of my next blog, “When Your Musician is a Robot (Can Automated Composers Write Good Music?)”. We’ll demonstrate various musical styles and include movie clips for you to view so that we can hear the background music in context. Stay tuned!

Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)

By Rudy Helm, Audio and Quality Assurance Tech, Visual Purple, LLC.

At the end of my previous discussion on NPC Voice-over production, I promised that I would follow up with a blog about what it might take to try to get a synthetic voice to be funny. Remember. We’re talking about NPCs (Non Player Characters), where otherwise playable characters are typically represented by professional voice-talent. I will provide you with samples as we roll along of course, as in tutorial fashion, but with the disclaimer that this is just one approach to this end, as there are likely other useful techniques that could be considered.

Ok, I sense you are protesting, how can a robot out ‘funny’ a performance by a professional voice-talent? I am not at all suggesting that a synthetic voice-actor can win such a contest. But if you are faced with options, and if this is the option you choose, you really want to come up with workable solutions.

What are the resources?

There are a number of synthetic voice vendors available. One obvious task is to choose one. A simple Internet search can help you solve that problem. For purposes of this discussion we will utilize 3rd-party software control mechanisms to effect voice properties. In this tutorial we’ll use a stand-alone audio editor along with a non-linear editor (NLE), but the same task almost certainly can be substituted by a digital audio workstation (DAW) of your choice. The audio editor might be replaced with XML controls if this is your favorite way to effect voice pitch and tempo, etc. However, I think it would be extremely tedious to try to deploy markup languages as a substitute for a DAW. By the end of this writing I bet you will probably agree with me. Please refer to my earlier post, When Your Voice-actor is a Robot, about some detail on resources. And then there is that last very important asset to have. Someone who is funny!

Here at Visual Purple, we are fortunate to have a gentleman who is a very funny guy. And for this experiment it makes for a very lucky day! So, you may be thinking, why are we talking about working with a funny human? Isn’t this topic about having a funny robot? Well, yes is the answer to that — but our funny human (Let’s call him John) will serve as a model for our robot.

Say what?

The short answer is, we will import audio clips of the funny human into our DAW, and then we will import audio clips of the synthetic voice and make it emulate the human’s speech patterns.

Say what?!!
Ok, in this project our goal is to make some humorous fish voices. You see, we have a scene in one our products where someone at a bar can stand and stare at a fish tank. As the fish swim by, and if the avatar is situated close enough to the fish tank, the fish might begin to say wise cracks to the, uh, fish admirer. This is an ‘Easter egg’ where fun is poked at the avatar, possibly insinuating that he has had a bit too much to drink. And to achieve our goal, we need to mode the synthetic voice clip to try to emulate the comedic timing as expressed in the human model.

Let’s do it!
So let’s start the process by importing into our DAW an audio clip that John, the funny human recorded for us (Figure 1).

Figure 1 Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)
Figure 1

Next, Listen to John’s original model for reference. The script: “Last week it was a lawyer’s convention. I never seen so many sharks!” We follow that by importing a correlating audio clip from the synthetic voice (Figure 2).

Figure 21 Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)
Figure 2

Without doing anything further at this stage, we can easily see that the graphical sound ‘blobs’ don’t match. So, before we move on, have a Listen at the robot’s recording. Notice that this clip has already been treated with pitch transposition. (For a discussion on ways to do that, please refer to my earlier post, When Your Voice-actor is a Robot.) Our intent was to get cartoon-y voices, so we started with a female TTS voice and then modified her pitch characteristics.

Now, to make the robot emulate John’s comedic speech patterns, we need to edit the clip’s timelines so that the graphical sound ‘blobs’ do match. Figure 3 illustrates an example:

Figure 3 Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)
Figure 3

In Figure 3’s example we see only the first two words of the script (“Last week…”). Listen to how the TTS’s utterance of the word ‘week’ occurs earlier in the timeline than does John’s blob of the same word. Close — but the timing is just not right is it? Note that we need to create a split point (the vertical line represents this) just before the TTS’ blob. Doing this enables us to separate the words and move them as we wish on the timeline (see Figure 4).

Figure 4 Comedic Treatment in TTS Voices (Can Robots be Funny?, Part 1)
Figure 4

Now, Listen to both voices speak those two words in sync. (…to be continued)

LouiseBrooks theme byThemocracy

SEO Powered by Platinum SEO from Techblissonline