Scarlett Johansson attends the 2020 Vanity Fair Oscar Party hosted by Radhika Jones at Wallis Annenberg Center for the Performing Arts on February 09, 2020 in Beverly Hills, California.
Karwai Tang | Getty Images Entertainment | Getty Images
OpenAI announced it would pull one of the ChatGPT voices named “Sky” after it created controversy for its resemblance to the voice of actress Scarlett Johansson in “Her,” a movie about artificial intelligence.
“We’ve heard questions about how we chose the voices in ChatGPT, especially Sky,” the Microsoft-backed company posted on X. “We are working to pause the use of Sky while we address them.”
The 2013 sci-fi film “Her” is about a man who falls in love with an artificial intelligence system named Samantha, voiced by Johansson.
The news comes one week after OpenAI debuted a range of audio voices for ChatGPT, its viral chatbot, a new AI model called GPT-4o, and a desktop version of ChatGPT.
Users watching the live demonstration of ChatGPT’s audio capabilities immediately began to post on social media that the “Sky” voice sounded like Johansson in the movie. OpenAI CEO Sam Altman seemingly referenced the film in a post on X, simply writing “her.”
In a Sunday blog post, OpenAI wrote that the chatbot’s five voices — Breeze, Cove, Ember, Juniper and Sky — were selected through a casting and recording process that spanned five months. Casting professionals received about 400 submissions from voice and screen actors and whittled that number down to 14, according to the company. Then an internal team selected the final five.
“Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” the company wrote. “To protect their privacy, we cannot share the names of our voice talents.”
OpenAI plans to test Voice Mode in the coming weeks, with early access for paid subscribers to ChatGPT Plus, according to recent blog posts, and it also plans to add new voices. OpenAI also said the new model can respond to users’ audio prompts “in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.”
The company, founded in 2015, has been valued at more than $80 billion by investors. It’s under pressure to lead the generative AI market while finding ways to make money as it spends massive sums on processors and infrastructure to build and train its models.
OpenAI, Microsoft and Google are at the helm of a generative AI gold rush as companies in seemingly every industry race to add AI-powered chatbots and agents to avoid being left behind by competitors. Earlier this month, OpenAI rival Anthropic announced its first enterprise offering and a free iPhone app.
A record $29.1 billion was invested across nearly 700 generative AI deals in 2023, an increase of more than 260% from the prior year, according to PitchBook. The market is predicted to top $1 trillion in revenue within a decade.
In last week’s live presentation, OpenAI team members demonstrated ChatGPT’s audio capabilities. For example, the chatbot was asked to help calm someone before a public speech.
OpenAI researcher Mark Chen demonstrated the model’s ability to tell a bedtime story and asked it to change the tone of its voice to be more dramatic or robotic. He even asked it to sing the story. The team also asked it to analyze a user’s facial expression to comment on the emotions the person may be experiencing.
“Hey there, what’s up? How can I brighten your day today?” ChatGPT’s audio mode said when a user greeted it.