AI Voice Cloning: An Alarming Trend Set to Explode

Why Trust Techopedia

In an unsettling revelation, a BBC presenter’s voice was cloned without her permission to promote a product, clearly showing the potential and risks of AI voice cloning technology.

This event not only demonstrates the capabilities of artificial intelligence (AI) but also raises significant ethical concerns and potential for misuse. As we face the consequences of deepfake technologies, from bringing back the voices of the dead to impersonating public figures, there is a pressing need for strong legal measures to responsibly manage these advancements.

This article explores how AI voice cloning works, its significant impact in various fields, and the developing legal actions intended to protect individual and public interests.

Key Takeaways

  • AI technologies, especially voice cloning, are increasingly misused, posing significant ethical and legal dilemmas.
  • A notable case involved BBC presenter Liz Bonnin, whose voice was cloned without consent for misleading advertisements.
  • The urgency to regulate AI deepfakes has led to proposed laws like the NO FAKES Act in the US, aiming to protect the public and artists alike.
  • Despite risks, voice cloning holds great potential for accessibility, entertainment, and personalized communication.
  • Advancing AI technology calls for a balanced approach to harness its benefits while guarding against its ethical risks.

The Misuse of AI Voice Technology

BBC Presenter Liz Bonnin’s Voice is Cloned

Recently, AI-generated voice technology was misused in a high-profile case involving BBC presenter Liz Bonnin. Her voice was copied without her permission and used in an ad campaign for insect repellent.

What makes this different from some of the cheap adverts you see around the web where celebrities faces are used to promote dubious products (often investment scams) in this instance the company behind the advert, Incognito, was also duped into thinking the celebrity was on-board.

Bonnin, known for her hosting roles on “Bang Goes the Theory” and “Our Changing Planet” told The Guardian:

Advertisements

“It does feel like a violation and it’s not a pleasant thing. Thank goodness it was just an insect repellant spray and that I wasn’t supposedly advertising something really horrid!”

Scammers used a forged voice message, claiming to be Bonnin, giving her consent to appear in adverts for insect repellent. The message initially mimicked Bonnin’s voice but gradually shifted in accent, raising suspicions about its authenticity.

Howard Carter, CEO of Incognito — the company behind the advertisement — initially believed he was in direct communication with Bonnin. This belief was based on several voice messages convincing him of her endorsement.

The individual posing as Bonnin provided Carter with a phone number and email address, along with contact details supposedly from the Wildlife Trust, where Bonnin serves as president.

Negotiations took place via WhatsApp and email, and experts believe AI was used to create a digital voice likeness of Bonnin.

On March 13, Carter received an email with a contract, which he believed was signed by Bonnin. As shown in bank statements, the company transferred £20,000 to an account linked to a digital bank on March 15.

Images of Bonnin for the campaign were sent five days later — but then Incognito’s subsequent emails remained unanswered.

The campaign launched using quotes and images provided by the scammers, and the scam was only uncovered after Bonnin publicly declared she had not consented to participate.

Bonnin said:

“I’m very sorry for what the company has gone through. It’s not fun for them at all, but it’s a violation on both our parts. It is a reminder that, if it looks too good to be true and too easy, or a little bit strange, triple check or quadruple check.”

The Rise of Deepfake Clones

This incident is not unique; similar misuses of AI have affected other public figures, making digital impersonation a widespread issue.

Deepfake technology created a fake audio of London Mayor Sadiq Khan making controversial comments just before Armistice Day, while an audio deepfake clip of Philippine President Ferdinand Marcos Jnr directing his military to act against China has also emerged, causing serious concern among government officials in Manila.

Furthermore, audio deepfakes are actively being used to scam people to penetrate accounts. For example, a Vice journalist successfully entered his own bank account using an AI replica of his voice.

These examples demonstrate how AI tools such as Microsoft’s VASA-1 and OpenAI’s Voice Engine could produce convincing fake content. Although these tools have not been openly released to the public, the research behind them shows that VASA-1 is capable of creating highly realistic deepfake videos and voices from just a single photo and a short audio clip. Similarly Voice Engine can mimic a voice from merely a 15-second sound recording.

Legitimate Uses and Benefits of Voice Cloning

Although voice cloning technology carries risks, it’s crucial to recognize its responsible applications can be highly beneficial. These capabilities can transform challenges into opportunities:

  • Accessibility

Voice cloning aids people who have lost their ability to speak due to illnesses or accidents by recreating their voice for communication devices, keeping their vocal identity intact. For example, breakthroughs in brain-computer interfaces (BCIs), known as “neuroprosthetics,” have empowered individuals with severe paralysis to speak again. These devices read brain activity related to speech and translate it into audible speech through AI. One significant case involved a woman named Ann, who, after a major stroke, used a BCI to transform her brain signals into a computer-generated voice trained to sound like her prior to the incident.

  • Entertainment and Media

Voice cloning technology significantly enhances dialogue in video games and films, reducing the need for ongoing recordings from voice actors. A compelling example of this is in the video game “Cyberpunk 2077,” especially in its DLC, Phantom Liberty. Following the death of Miłogost “Miłek” Reczek, the Polish voice actor for the character Viktor Vektor, the developers of the game chose to use voice cloning technology to preserve Reczek’s portrayal, rather than replacing him with a new actor. This decision was made to maintain character continuity and honor the late actor’s legacy. This was done with the endorsement and support from Reczek’s family.

Star Wars also used the technology to bring actor Peter Cushing back decades after his death, and de-age Carrie Fisher and Mark Hamil. We also saw a younger Harrison Ford in the recent Indiana Jones and Dial of Destiny.

  • Personalized Marketing

Companies use voice cloning to create unique customer service by mimicking the voices of well-known personalities or a brand’s distinct voice. An example is KFC Canada’s project where they used AWS AI to mimic the voice of their founder, Colonel Sanders, for an Alexa skill. This lets customers talk to the Colonel to order food, making the process engaging and retaining his iconic character for customer interactions.

  • Educational Tools

Voice cloning transforms educational materials by making them more interactive with the voices of historical figures. A standout use is the “Ask Dalí” exhibit at the Dalí Museum in Florida, where an AI, trained on Salvador Dalí’s interviews, answers visitors in his style, enriching the educational experience.

Understanding and managing the risks alongside these benefits allows us to use voice cloning technology ethically and effectively, improving both digital and real-world interactions.

How AI Voice Cloning Works

AI voice cloning uses complex machine learning and deep learning algorithms to create a synthetic version of a person’s voice from audio samples. The steps involved are as follows:

  • Data Collection

This initial step involves gathering numerous audio samples of the target voice. These recordings should include a variety of speech sounds to ensure that the AI can learn to reproduce all the different sounds of the voice across various emotions and tones. Typically, this involves recording the person speaking different sentences to capture various speaking styles and emotional states.

  • Preprocessing and Feature Analysis

After collecting the audio data, it’s processed to remove any background noise and normalize the volume. Feature analysis then focuses on identifying important voice characteristics such as pitch (how high or low the voice is), tone (the quality of the sound), cadence (the rhythm and speed of speaking), and timbre (the unique texture of the voice). These features are crucial for understanding and replicating the nuances of the voice.

  • Neural Network Training
    • Deep Learning Models: At the heart of voice cloning are deep learning models like Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), which are trained on the extracted voice features. These models learn to predict subsequent sounds, enabling them to generate speech that mimics the original voice’s characteristics.
    • Text-to-Speech (TTS) Synthesis: This process converts text into spoken words. Advanced TTS systems use these trained neural networks to produce speech that not only sounds natural but also carries the right emotion and intonation based on the text input.
    • Generative Adversarial Networks (GANs): GANs are used to enhance the realism of the cloned voice. They consist of two parts:
      • Generator: This component creates the voice samples based on its training.
      • Discriminator: This component judges how authentic the generated voice samples sound compared to the original voice recordings. It provides feedback to the generator, helping it improve the quality and realism of the synthetic voice.
  • Postprocessing: The generated voice might go through further refinement to improve clarity, adjust speed, and make the voice sound as natural as possible. This could include audio effects like equalization and compression to enhance the overall sound quality.
  • Testing and Tuning: The last stage is extensive testing with various texts to ensure the AI performs well with any speech input. This testing helps to find and correct any issues with phonetics or unnatural speech patterns by further adjusting the models.

Through these steps, AI voice cloning technologies are capable of producing highly realistic and dynamic synthetic voices that closely resemble the original. These technologies continue to advance, incorporating the latest AI developments for better accuracy and versatility.

Ethical and Legal Implications

Recent statistics show a dramatic increase in deepfakes, highlighting the risks of AI-powered fraud. Between 2022 and 2023, there was a tenfold rise in detected deepfakes globally across various industries, based on over 2 million cases of identity fraud attempts.

For instance, deepfake-related identity fraud cases in the Philippines surged by 4500%, followed by Vietnam at 3050%, the US at 3000%, and Belgium at 2950%.

Ethical and Legal Implications

In the US, the urgency to address AI-generated deepfakes has been a significant topic of discussion in the Senate. The proposed NO FAKES Act aims to hold individuals and platforms accountable for creating or distributing unauthorized digital replicas. This federal law is designed to protect not just celebrities but the general public from the misuse of their digital likeness.

During a Senate Judiciary Committee hearing, professionals from the industry, including singer-songwriter FKA Twigs, supported the Act, emphasizing the need to protect artists and the public from exploitation without hindering artistic creativity or the legitimate uses of AI technologies.

This bill seeks to balance the encouragement of artistic creativity with the protection of individual rights. Figures like Robert Kyncl, CEO of Warner Music Group, have endorsed the bill, noting the importance of protecting artists’ rights alongside fostering creativity. The discussions also stressed the need to clearly define what constitutes a “digital replica” to ensure the law does not restrict free expression.

The Bottom Line

As AI voice cloning technology advances, it offers substantial benefits in various fields, yet it also poses significant ethical risks. Striking a balance between leveraging the technology’s advantages and minimizing its threats requires careful regulatory oversight.

With a sharp rise in deepfake incidents and their potential harm, comprehensive legislation like the proposed NO FAKES Act is crucial. This legislation seeks to protect individual rights without hindering technological progress and creativity.

As we approach this new frontier in AI, it is essential to proceed with both caution and foresight, ensuring that the technology benefits humanity without compromising our ethical standards or legal protections.

Advertisements

Related Reading

Related Terms

Advertisements
Maria Webb
Technology Journalist
Maria Webb
Technology Journalist

Maria is a technology journalist with over five years of experience with a deep interest in AI and machine learning. She excels in data-driven journalism, making complex topics both accessible and engaging for her audience. Her work is prominently featured on Techopedia, Business2Community, and Eurostat, where she provides creative technical writing. She holds a Bachelor of Arts Honours in English and a Master of Science in Strategic Management and Digital Marketing from the University of Malta. Maria's background includes journalism for Newsbook.com.mt, covering a range of topics from local events to international tech trends.