Excuse us as we dip into a little Black Mirror-esque near-future alarmism but for a good cause.
Your daughter is away on her first school trip and is excited as you drop her off.
It has been a few hours, you are back home, and your phone rings; it’s your daughter! A little surprised, you answer the phone.
You’re taken aback by a gruff and rude voice from the other side, “We have your daughter.” you can hear her crying for help in the background.
Disbelief turns into shock, and then the command comes in to pay $50,000, and your mind goes blank.
Charlie Brooker can handle the rest, but it ends with transferring the money, only to find that Casey is safe and happy, enjoying her trip.
Welcome to the world of virtual kidnapping. Scammers have just duped you.
Wait, What About Casey’s Voice Then? It Sounded So Real
Welcome to the world where anyone’s voice -— dead or alive — can be generated in seconds.
It already exists for different use cases. Let’s take Podcast.ai, which generates podcasts with artificial intelligence (AI): the host and guests are virtual and can be anyone — live on air right now is Joe Rogan and Steve Jobs.
Meanwhile, Spotify is translating its most popular podcasts into other languages — with the voices of the original podcaster intact.
According to Subbarao Kambhampati, a computer science professor at the Arizona State University specializing in AI, the voice cloning capabilities of AI have been rapidly improving.
“In the beginning, it would require a larger amount of samples. Now there are ways in which you can do this with just three seconds of your voice. Three seconds. And with the three seconds, it can come close to how exactly you sound.
“Most of the voice cloning actually captures the inflection as well as the emotion. The larger the sample, the better off you are in capturing those.
And if the tale we told you at the top of the article sounds too hypothetical to be accurate, the quotes are from a story where an Arizona mother received a voice-clone call demanding a million-dollar ransom for her daughter’s return — while her daughter was happily at dance class.
How Does Voice Cloning Work?
If you have prior knowledge of video deepfakes, think of AI voice cloning software as the auditory counterpart. With just a snippet of recorded speech, developers can assemble an audio dataset and employ it for training an AI voice model capable of replicating the target voice.
These models emulate the brain’s learning process, exhibiting remarkable efficiency in discerning patterns within data. While various approaches exist for applying deep learning to synthetic voices, they generally yield improvements in word pronunciation and the nuanced aspects of speech, such as speed and intonation, resulting in more lifelike and human-sounding voices.
According to Dan Mayo, a special agent with the FBI, scammers find their prey on social media. Indeed, social media is full of video and audio clips, and their owners are sitting ducks.
Mayo suggested, “You’ve got to keep that stuff locked down. The problem is, if you have it public, you’re allowing yourself to be scammed by people like this because they’re going to be looking for public profiles that have as much information as possible on you, and when they get a hold of that, they’re going to dig into you.”
The rapid improvement of AI voice cloning reflects the break-neck speed of AI development and raises the topic of ethics, secrecy, and security.
Undoubtedly, AI has been in the middle of an attack on our privacy and security, and there is no solution yet.
One thing we can do is be more careful — we may have gently sleep-walked into a world where we have happily been exposing our private lives to the world through social media.
We can’t possibly do much about slowing down the advancement of AI, but we can certainly be more careful with sharing our private information.
Scammers will always find new ways, but we have to be vigilant.