AI can simulate anyone’s voice with 3 seconds of audio

Get help or discuss anything relating to audio/video software & hardware
Post Reply
theboxinargentina
Posts: 301
Joined: Fri Oct 08, 2021 4:12 pm
Has thanked: 184 times
Been thanked: 97 times

AI can simulate anyone’s voice with 3 seconds of audio

Post by theboxinargentina »

This will be interesting...

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio
Text-to-speech model can preserve speaker's emotional tone and acoustic environment.
https://arstechnica.com/information-tec ... -of-audio/

https://valle-demo.github.io/
User avatar
Lord Reith
Posts: 4602
Joined: Thu Feb 18, 2021 8:22 am
Location: BBC House
Has thanked: 139 times
Been thanked: 3963 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Lord Reith »

It's a terrible idea. I can't imagine what use this could possibly serve other than open the doors for yet more fraud and fake news. Someone's voice is as unique as their face. Allowing it to used by others is a blatant form of identity theft.
Women there don't treat you mean, in Abilene
User avatar
Ziggy C
Posts: 551
Joined: Thu Oct 14, 2021 12:10 am
Location: Woodland Hills, CA
Has thanked: 96 times
Been thanked: 125 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Ziggy C »

This is the natural/unnatural evolution of sampling. It's been done with musical sounds for decades now. Guitar and synth patches crafted to emulate a particular musician's sound. Drum patches to emulate a particular drummer. It'll be interesting to see what can be done with voices. Although it does seem a bit sketchy that it can potentially open the door to all kinds of fraud.

Also there's this:
bringing your deceased friends and loved ones to audible life to say whatever you want to make them say.
User avatar
Lord Reith
Posts: 4602
Joined: Thu Feb 18, 2021 8:22 am
Location: BBC House
Has thanked: 139 times
Been thanked: 3963 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Lord Reith »

The potential for misuse of this thing is mind boggling. At best it could be married with AI chatbots to allow one to converse with dead relatives, which is a disturbingly creepy and dubious goal in itself. It will only encourage people to go further and further down the rabbit hole, divorced from reality and human interaction. And I can think of a hundred more nefarious applications.

The trouble with these tech gods is that they only ask "Can I do this?" instead of "Should I do this?" They are as bad as the military who devise ever more deadly weapons to potentially wipe us all out 1000 times over.
Women there don't treat you mean, in Abilene
harrylime
Posts: 254
Joined: Sat Feb 19, 2022 4:56 pm
Has thanked: 59 times
Been thanked: 58 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by harrylime »

IIRC the movie Wag The Dog includes an example of a fake telephone conversation used for a political smear campaign. And that was merely done with clever editing. It’s not far fetched to imagine what serious implications a natural sounding fake voice can have.
User avatar
zappaf78
Posts: 240
Joined: Mon Aug 23, 2021 8:01 pm
Has thanked: 35 times
Been thanked: 11 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by zappaf78 »

Understanding now of what Robin Williams did in his will:
https://www.theguardian.com/film/2015/m ... ng-us-laws
User avatar
Lord Reith
Posts: 4602
Joined: Thu Feb 18, 2021 8:22 am
Location: BBC House
Has thanked: 139 times
Been thanked: 3963 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Lord Reith »

My friend once caught me by surprise with something he said about the tech revolution. I commented in passing that a lot of the stuff that was in scifi novels 50 years ago was now coming true. He replied that "No, they're making it come true." I then thought of the "iPads" and touch screens seen in Star Trek Next Generation and how all that was ported over wholesale into Steve Jobs' devices. The same applies for so much other stuff today... they're literally knicking the ideas out of old movies and shows!

It makes me wonder what the "real" future would have turned out like, if these guys hadn't been obsessed with turning tv shows into reality. I often feel today like I am living in Star Trek except that we don't have star ships. We're stuck on Earth, and we're making a right balls up of it.

So if you want to know what the future will be like just watch, say, Colossus the Forbin Project from 1970 because I bet there are misguided people out there trying to make it come true.
Women there don't treat you mean, in Abilene
Ken_peps
Posts: 561
Joined: Thu Mar 11, 2021 4:52 pm
Been thanked: 25 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Ken_peps »

As a jazz musician of 40 years and also music director of festivals and various recording projects over the years, I can attest to the fact that one can no longer tell how well a performer can actually play or sing unless you can hear them up close, as absolutely everything - pitch, phrasing, timing, etc. can be "fixed" in the studio, and with vocalists, even live on stage. You can't fight technology, and obviously it can be put to many interesting uses, as Lord Reith, for example, has shown us so magnificently time and time again, but it also allows for some pretty devious deceptions.....
User avatar
Lord Reith
Posts: 4602
Joined: Thu Feb 18, 2021 8:22 am
Location: BBC House
Has thanked: 139 times
Been thanked: 3963 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Lord Reith »

Ken you make me think of those buskers who set up on a street corner with $30,000 worth of equipment. They've got backing tracks and reverbs and all sorts of stuff. How do you know which part of it is taped and which part is the actual busker? It kind of makes it all pointless. The kind of busker I like is just someone who can play their instrument really well by itself. At the other end of the scale I can recall a kid set up with a didgeridoo and a huge sound system playing techno rhythm accompaniment, and he had an absolutely huge crowd gathered around. But as I watched him from behind, I noticed that at one point he pulled slightly away from the didg and licked his dry lips... but the sound of the didg did not cut out! :lol: If he hadn't been 12 years old I would have outed him right there and then! :P

And for some reason this all makes me think of The Beatles, and how honesty and genuine... erm, ness is the best policy. Sometimes they sucked when they played live, but it didn't matter. They were what they were and it didn't matter because when they were good they were tremendous. You got what you saw. Now we are increasingly facing a world of smoke and mirrors, where nothing is as it seems and truth is seemingly an anachronism.
Women there don't treat you mean, in Abilene
User avatar
Ziggy C
Posts: 551
Joined: Thu Oct 14, 2021 12:10 am
Location: Woodland Hills, CA
Has thanked: 96 times
Been thanked: 125 times

Re: AI can simulate anyone’s voice with 3 seconds of audio

Post by Ziggy C »

One night, several years back. I went with some friends to The Canyon in Agoura to see, of all bands, King Crimson. They were excellent, of course. I was standing right up in front, Stage left, where Fripp was playing. I could read the settings on his pedal board....that close. Some clown busted out and lit up a cigar. That ended the show. Fripp immediately signaled the band into their final number, Red. Bummer.

The next band was an 80's covers band called The Spazmatics. They are still around, I think. They, at first seemed to be a well-honed band which covered all manner of 80's music, from The Clash to Dexy's Midnight Runners. But what they really were? A group of charlatans. A shrewdly marketed soundalike band whose members had questionable talent, if any at all. They were the highlight of the night (not for me. For the club.) A four-piece band. Drums, guitar, vocals, and bass. But I could swear they were not only not playing their instruments, but not singing as well. The whole point of people being on the stage pretending to be musicians was to have people on the stage pretending to be musicians. The music was all canned. Triggered by the drummer who had a Mac computer propped up next to his drums. Absolutely shameless. And yes, the toilet is much deeper than that, as we've seen up to this day.

On the other hand, at the local Thai sushi place my wife and I like to go to, Red Ginger, on the weekends is a quite talented guitarist and singer. He also plays to canned music accompaniment and background tracks. But he's actually playing the guitar, soloing, and singing. And he's pretty damn good.

So there's two sides to the coin there. For the last couple decades I've watched as whole bands have disappeared from venues and been replaced by a single musician playing to canned accompaniment. Or worse, a DJ with his rig, absolutely rendering unlistenable everything he touches.
Last edited by Ziggy C on Mon Jan 16, 2023 5:57 am, edited 1 time in total.
Post Reply