Stars from Hollywood’s golden age are being reborn through celebrity estate AI voice cloning deals, a sign of how some of the “Wild West” concerns about unauthorized AI impersonation are being addressed by new business models.
ElevenLabs, an audio technology startup funded by venture capital firms including Andreessen Horowitz and Sequoia has penned multiple deals with the estates of legendary actors for its IconicVoices tool that allows users to have AI-generated voices read to them via an audiobook app. The stars include Burt Reynolds, Judy Garland, James Dean and Sir Laurence Olivier.
ElevenLabs, which launched in 2023, creates audio for books and news articles, video game characters, film pre-production, and social media and advertising. The company already works with publishers including the New York Times and Washington Post and earlier this year, the company was selected by Disney to join its accelerator program.
“You need around 30 minutes of high-quality audio to create a professional voice clone,” said Sam Sklar, a member of ElevenLabs’ growth team, and the voices are generated from the celebrity’s catalog. Once created, it can be called upon to read text (articles, PDFs, ePubs, newsletters, or other text content). However, the voice and content are not able to be exported, with all of the listening in a reading app.
A user could, for instance, have articles narrated to them by James Dean within the app, but users cannot access the voices for any content not already in the app.
These kinds of deals could help set the boundaries for a future in which AI-generated voice content is less contentious and more of a controlled, curated terrain. Google Play and Apple Books utilize AI-generated voices to some extent already, though there are high hurdles to recreating human voice pacing, intonation and emotion.
The AI industry has been plagued by concerns about use of celebrity voices, with OpenAI doing an about-face in Mayafter actress Scarlett Johansson accused the company of ripping off her voice after she rejected offers to license it.
“We’re very alive to the risks associated with synthetic media and take the safe use of our tools incredibly seriously,” Sklar said. Safeguards include active moderation of content, accountability enforceable with bans, and special provisions for safeguarding the impact of AI voice on the 2024 election.
Among the current generation of actors, there remains significant anxiety surrounding the use of AI in generating voice content. Voice actors for video games have raised concerns, and last year’s film and television strike had significant roots in anxieties over the use of AI. The use of iconic voices sold by estates is a market niche that potentially avoids these pitfalls, representing a new income stream from AI rather than a lost income stream because of AI.
The use of soundalike celebrity voices is an issue that predates AI, such as the 1988 case of Frito Lay using a Tom Waits soundalike in their ads, and another Waits’ case in 2007, after Waits himself had long refused advertising deals. AI presents an easier path to creating soundalikes, and recent lawsuits levied against AI startup Lovo for allegedly inappropriate and uncompensated use of voice actors in generating its AI voices is a reminder that the world of AI voice generation is likely to some degree to remain a complicated, litigious one. (Lovo has denied the claims in the suit and also pointed to a revenue-sharing model it offers actors for cloned voices.)
It’s difficult to assess the protections in places without reviewing the specific language of the IconicVoices contracts, said Steve Cohen, a partner at Pollock & Cohen who is representing voice actors in an unrelated lawsuit alleging cloning of voices without permission.
ElevenLabs points to the way that its IconicVoices tool attains permissions and curates usage of the voices.
“Giving permission for using one’s voice is one of the basics,” Cohen said. “I think the key factors are permission, compensation, and control.”
New, clearer laws may also be a disincentive to people tempted to improperly appropriate a voice, “not for hardcore bad guys, but for edge cases,” Cohen said. But quoting Bette Davis in “All About Eve,” he added, “‘Buckle your seatbelts; it’s going to be a bumpy ride.'”
How realistic cloned voices sound is also an evolving issue. Many experts say that because AI doesn’t “know” what it’s saying, performance quality is limited. Sklar said ElevenLabs’ latest level of speech quality is indistinguishable from real human speech. “The text-to-speech tools from ElevenLabs can understand the context of the words,” he said.
AI is only as good as the models on which it is trained, and the actors’ voice datasets become part of the process.
“Neural models derive their capabilities from mimicking/memorizing nuances and patterns present in their training data,” said Nauman Dawalatabad, a postdoctoral associate at the MIT Computer Science and Artificial Intelligence Laboratory with extensive research in AI voice generation. “The quality and diversity of training data significantly influence the model’s performance.”
The vocal delivery of movie stars could add to the AI mimicry and learning by providing the kind of “high-quality voice datasets for training and fine-tuning large models” that Dawalatabad said is essential to the process. But he expressed reservations about “sounding human” as being the right test for the AI voice field, as that could reinforce an antagonistic relationship between human and synthetic voicings.
Voice actors remain divided on the technology, with some refusing to consider any deals but others saying opportunities to clone their voices for speedier, cheaper production on some forms of audiobooks can’t be ignored. “AI technology can help workflows. AI is not a new tool for voice talent, producers, and publishers, many of whom use it to improve their quality control in post-production,” Michele Cobb, executive director of the Audio Publishers Association, told CNBC last year.
Recent generative models have shown substantial advancements compared to earlier iterations, making it increasingly difficult to distinguish between fake and authentic voices by ear alone, according to Dawalatabad. AI voice licensing could alleviate workload for voice actors, he added, without supplanting them, as they “intercede in the process by focusing on offering correction or enhancement to ineffable aspects such as intonation, warmth, and emphasis, which still present challenges.”