AI can simulate anyone’s voice with 3 seconds of audio

Tex · Post by **Tex** » Mon Jan 30, 2023 12:14 am

The real potential is restoration of low quality sources. Imagine hearing voices from the 1920s to 1940s in modern quality as if recorded yesterday.

Lord Reith · Post by **Lord Reith** » Mon Jan 30, 2023 8:18 am

Golem wrote: ↑Sun Jan 29, 2023 10:19 pm
Yeah that's a very reasonable point, like I don't think AI sentience is really a massive risk here. But we can only hope the power for AI to fight evil advances as fast as it's ability to create it. Counter AIs do exist to be fair, when you have an AI that creates something fake, you need another counter AI that judges whether its actually any good or not, and so I suppose you could use a counter AI to figure out what's actually real. Plus, as good as the AIs are, they still have tells, and so who knows if "perfect" fake audio/video is even achievable.

Yes but I think the AI arms race is what will drive it forward. The same as the Apollo missions only happened because of cold war tensions. As white hats and black hats battle to outwit each other, the tech will get smarter and smarter. I believe that is what will drive it, not Apple trying to find new gimmicks to put in its phones.

Engonoceras wrote: ↑Mon Jan 30, 2023 12:14 am The real potential is restoration of low quality sources. Imagine hearing voices from the 1920s to 1940s in modern quality as if recorded yesterday.

Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.

Golem · Post by **Golem** » Mon Jan 30, 2023 8:51 am

I've always wondered if it was possible for an AI to just recreate performances from scratch, like we can recreate other instruments digitally, why couldn't we mimic vocal cords? Like not even a deepfake, but the physical muscles that make the sound. I don't know if that's the easiest way, but it's a wonder I've had since playing this one game a few years ago.

theboxinargentina · Post by **theboxinargentina** » Mon Jan 30, 2023 3:43 pm

Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.

Yes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!

tdgrnwld · Post by **tdgrnwld** » Mon Jan 30, 2023 4:07 pm

theboxinargentina wrote: ↑Mon Jan 30, 2023 3:43 pm
Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.
Yes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!

It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.

Golem · Post by **Golem** » Mon Jan 30, 2023 8:01 pm

tdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pm
theboxinargentina wrote: ↑Mon Jan 30, 2023 3:43 pm
Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.
Yes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!
It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.

I mean, you could just get a bunch of high quality recordings, and then convert them to a lower quality

Lord Reith · Post by **Lord Reith** » Mon Jan 30, 2023 9:08 pm

tdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pm It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.

I'm sure it could be doable but it doesn't attract the sort of people who could do it. There's no mass market application for it like there is with demixing. Back in the 80s and even 90s there were people working with old audio from the 20s and 30s, but now there is very little interest in that.

Ziggy C · Post by **Ziggy C** » Tue Jan 31, 2023 12:27 am

It would be nice if there were a colorization algorithm that could convert the varying degrees of gray in B/W film into the actual colors. This would certainly save the time of assigning colors based on known information and still photos.

And for that matter, the idea I posited back in the 80's, for a video recorder that could plug straight into the wall and record the cable signal for say, two hours. And then that could be converted into AV from every channel broadcast during that span. So we could play it back and select the channel at that time. This business of DVR's only allowing for six simultaneous recordings, as we have now, is just a stroke job. It follows my prediction from the 80's. But it's only 6 feeds. Clearly the technology exists.

And what does this have to do with AI? Wait and see.

Tex · Post by **Tex** » Tue Jan 31, 2023 1:55 am

Even with low quality recordings you can easily recognize different voices so what we recognize as a specific voice is not even in the higher frequencies that's just where the clarity is. So the voice pattern is largely found in the middle and lower frequencies.

The test would be how much can you degrade a good voice recording and reconstruct it digitally with AI to approximate the original.

zaval80 · Post by **zaval80** » Tue Jan 31, 2023 7:01 am

tdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pm It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.

Would be under lock by the governments.

beatlegdb.com

AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio

Re: AI can simulate anyone’s voice with 3 seconds of audio