AI can simulate anyone’s voice with 3 seconds of audio
- Tex
- Posts: 1202
- Joined: Fri Feb 19, 2021 1:12 am
- Location: Texas
- Has thanked: 6 times
- Been thanked: 638 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
The real potential is restoration of low quality sources. Imagine hearing voices from the 1920s to 1940s in modern quality as if recorded yesterday.
- Lord Reith
- Posts: 4678
- Joined: Thu Feb 18, 2021 8:22 am
- Location: BBC House
- Has thanked: 145 times
- Been thanked: 4061 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
Yes but I think the AI arms race is what will drive it forward. The same as the Apollo missions only happened because of cold war tensions. As white hats and black hats battle to outwit each other, the tech will get smarter and smarter. I believe that is what will drive it, not Apple trying to find new gimmicks to put in its phones.Golem wrote: ↑Sun Jan 29, 2023 10:19 pm
Yeah that's a very reasonable point, like I don't think AI sentience is really a massive risk here. But we can only hope the power for AI to fight evil advances as fast as it's ability to create it. Counter AIs do exist to be fair, when you have an AI that creates something fake, you need another counter AI that judges whether its actually any good or not, and so I suppose you could use a counter AI to figure out what's actually real. Plus, as good as the AIs are, they still have tells, and so who knows if "perfect" fake audio/video is even achievable.
Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.Engonoceras wrote: ↑Mon Jan 30, 2023 12:14 am The real potential is restoration of low quality sources. Imagine hearing voices from the 1920s to 1940s in modern quality as if recorded yesterday.
Women there don't treat you mean, in Abilene
Re: AI can simulate anyone’s voice with 3 seconds of audio
I've always wondered if it was possible for an AI to just recreate performances from scratch, like we can recreate other instruments digitally, why couldn't we mimic vocal cords? Like not even a deepfake, but the physical muscles that make the sound. I don't know if that's the easiest way, but it's a wonder I've had since playing this one game a few years ago.
-
- Posts: 301
- Joined: Fri Oct 08, 2021 4:12 pm
- Has thanked: 187 times
- Been thanked: 97 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
Yes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.
Re: AI can simulate anyone’s voice with 3 seconds of audio
It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.theboxinargentina wrote: ↑Mon Jan 30, 2023 3:43 pmYes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.
Re: AI can simulate anyone’s voice with 3 seconds of audio
I mean, you could just get a bunch of high quality recordings, and then convert them to a lower qualitytdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pmIt's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.theboxinargentina wrote: ↑Mon Jan 30, 2023 3:43 pmYes this is something I've imagined, a sort of "up-scaling" of less than perfect recordings. It will happen!Lord Reith wrote: ↑Mon Jan 30, 2023 8:18 am Consider that the difference between what most people would call a "lofi" recording and a "hifi" one is but one octave of extra frequency response - between 5 and 10khz. And all of that just consists of overtones, not actual musical notes. The other six octaves of musical information below that are already present in just about any lofi recording. While there are some clever tools for synthesising this missing octave, what we need is a software that can intelligently recreate it by referring to a similar sounding performance. Something similar to the demixing algrithms, but with a different end goal. I would imagine this is not too far fetched, but there probably isn't the demand for such a thing these days except among sound restorationists. Maybe that is why it hasn't happened yet, or is happening very slowly.
- Lord Reith
- Posts: 4678
- Joined: Thu Feb 18, 2021 8:22 am
- Location: BBC House
- Has thanked: 145 times
- Been thanked: 4061 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
I'm sure it could be doable but it doesn't attract the sort of people who could do it. There's no mass market application for it like there is with demixing. Back in the 80s and even 90s there were people working with old audio from the 20s and 30s, but now there is very little interest in that.tdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pm It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.
Women there don't treat you mean, in Abilene
- Ziggy C
- Posts: 556
- Joined: Thu Oct 14, 2021 12:10 am
- Location: Woodland Hills, CA
- Has thanked: 97 times
- Been thanked: 126 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
It would be nice if there were a colorization algorithm that could convert the varying degrees of gray in B/W film into the actual colors. This would certainly save the time of assigning colors based on known information and still photos.
And for that matter, the idea I posited back in the 80's, for a video recorder that could plug straight into the wall and record the cable signal for say, two hours. And then that could be converted into AV from every channel broadcast during that span. So we could play it back and select the channel at that time. This business of DVR's only allowing for six simultaneous recordings, as we have now, is just a stroke job. It follows my prediction from the 80's. But it's only 6 feeds. Clearly the technology exists.
And what does this have to do with AI? Wait and see.
And for that matter, the idea I posited back in the 80's, for a video recorder that could plug straight into the wall and record the cable signal for say, two hours. And then that could be converted into AV from every channel broadcast during that span. So we could play it back and select the channel at that time. This business of DVR's only allowing for six simultaneous recordings, as we have now, is just a stroke job. It follows my prediction from the 80's. But it's only 6 feeds. Clearly the technology exists.
And what does this have to do with AI? Wait and see.
- Tex
- Posts: 1202
- Joined: Fri Feb 19, 2021 1:12 am
- Location: Texas
- Has thanked: 6 times
- Been thanked: 638 times
Re: AI can simulate anyone’s voice with 3 seconds of audio
Even with low quality recordings you can easily recognize different voices so what we recognize as a specific voice is not even in the higher frequencies that's just where the clarity is. So the voice pattern is largely found in the middle and lower frequencies.
The test would be how much can you degrade a good voice recording and reconstruct it digitally with AI to approximate the original.
The test would be how much can you degrade a good voice recording and reconstruct it digitally with AI to approximate the original.
Re: AI can simulate anyone’s voice with 3 seconds of audio
Would be under lock by the governments.tdgrnwld wrote: ↑Mon Jan 30, 2023 4:07 pm It's a reasonable application of neural networks, but to do it right, it requires a large dataset of paired hi-fi and lo-fi (recorded on the same rig you aim to upsample from) recordings. I suspect that the lack of such a dataset - due to the cost of producing one - is a major reason we haven't seen these systems already.