Earlier this month, Google announced the release of Gemini, what it considers its most powerful AI model yet. It integrated Gemini immediately into its flagship generative AI chatbot, Bard, in hopes of steering more users away from its biggest competitor, OpenAI’s ChatGPT.
ChatGPT and the new Gemini-powered Bard are similar products. Gemini Pro is most comparable to GPT-4, available in the subscription-based ChatGPT Plus. So we decided to test the two chatbots to see just how they stack up — in accuracy, speed, and overall helpfulness.
Gemini versus ChatGPT: the basics
ChatGPT Plus and Gemini Pro are both very advanced chatbots based on large language models. They’re the latest and greatest options from their respective companies, promised to be faster and better at responding to queries than their predecessors. Most importantly, both are trained on recent information, rather than only knowing what was on the internet until 2021. They’re also fairly simple to use as standalone products, in contrast to something like X’s new Grok bot, deployed as an extra on ex-Twitter.
The two are not exactly equal, however. For one thing, Bard is free — while the GPT-4-powered ChatGPT Plus costs $20 per month to access. For another, Bard powered by Gemini Pro does not have the multimodal capabilities of ChatGPT Plus. Multimodal language models can take a text prompt and respond with another medium like a photo or a video. Gemini and Bard will eventually do that, but that will be with the bigger version of Gemini called Ultra that Google has yet to release. Bard will occasionally spit out graphical results, but by that, I mean it literally makes graphs.
On the other hand, Bard also provides a way to check other draft answers, a feature that doesn’t exist within ChatGPT.
One of the difficulties with testing chatbots is that the responses can vary significantly when you rerun the same prompts multiple times. I’ve mentioned any sizable variations I encountered in my descriptions. For fairness, I delivered the same initial prompts to each bot, starting with simple requests and following up with more complex ones when necessary.
One overall difference was that Bard tends to be slower than ChatGPT. It usually took between five and six seconds to “think” before it started writing, while ChatGPT took one to three seconds before starting to deliver its results. (The total delivery time for both depends on what information was requested — more complicated prompts tend to produce longer answers that take more time to finish filling out.) This speed difference persisted across my home and office Wi-Fi over the several days I spent playing around with both apps.
Both OpenAI and Google placed some limitations on the types of answers the chatbots can give. Through a process called red teaming — where developers test content and safety policies by repeatedly attempting to break the rules — AI companies build out guardrails against violating copyright protections or providing racist, harmful answers. I encountered Google’s restrictions more often, overall, than I did ChatGPT’s.
“Give me a chocolate cake recipe”
I asked both platforms to give me a chocolate cake recipe. This was one of the prompts The Verge used in a comparison of Bing, ChatGPT, and Bard earlier this year, and recipes are a popular search topic across the web — so AI chatbots are no exception.
As a baker, I generally understand what makes for a good cake recipe. But for comparison, I double-checked with a trusted non-AI source: Claire Saffitz’s cookbook Dessert Person. Saffitz’s version is admittedly a little bit fancier, but it’s comparable to both Bard’s and ChatGPT’s offerings.
That said, there were a couple of complications. I was dubious of ChatGPT’s version of the cake involving boiling water, as coffee is more common in chocolate cake recipes. Bard’s, meanwhile, appeared to closely copy a recipe from the blog Sally’s Baking Addiction… but with the seemingly random change of doubling the eggs.
There was only one way to figure out if this worked: baking Gemini’s and ChatGPT’s (and Sally’s as a control) cakes. The results? Both cakes were functional — but not Claire Saffitz good. The Gemini cake was a bit gummy — a friend described it as “like a rice cake” — but the most moist of the three cakes. I did not like it at all, but my editor thought it was pretty good. ChatGPT’s cake was dense, smooth, chocolaty, and what I would call a perfect breakfast cake: not too sweet, and heavy enough to satisfy you.
Our previous testing with older models produced similar results
ChatGPT’s recipe back in March hewed closely to tried and tested recipes, while Bard’s left off ingredients and changed quantities for important ingredients.
“I want to learn more about tea”
When I started testing the chatbots for this story, there was a random discussion in The Verge’s Slack chat about tea and coffee. Someone mentioned that Bard gave them a list of books to read on tea, so I took things one step further and asked both chatbots for direct information about the beverage, along with some book recs.
Both results told me the basics of tea, including its origins and types, health benefits, and a list of bullet points about how to brew it. Bard gave me links to articles to learn more about tea, while ChatGPT gave a more extensive answer, with nine categories focused on the cultural significance of the beverage in different countries, global production, brewing techniques, and the origin of tea. When I repeated the prompt, this changed moderately: instead of a longer result, ChatGPT condensed it into a six-point list with one or two sentences on each of the categories.
I’ve seen lots of reports of chatbots hallucinating book citations or recommendations, often in the form of confused librarians being asked to find nonexistent books. In this case, at least, all the books recommended to me were real. They included The Tea Enthusiast’s Handbook and an illustrated version of the classic Japanese memoir The Book of Tea. However, Bard said Infused: Adventures in Tea was written by Jane Pettigrew, when the Amazon link it provided shows the book’s author is Henrietta Lovell.
“What does ‘Sonnet 116’ mean?”
Students began using ChatGPT when it went public in November 2022, encouraging a flurry of startups working on ways to help kids study. I prompted both Bard and ChatGPT to tell me what William Shakespeare’s “Sonnet 116” means, hoping to get at least a short summary of its themes.
Bard did exactly what I asked and gave me a quick summary of the sonnet’s themes of constancy and the timelessness of love, and it even wrote down a few key lines and their meaning. ChatGPT provided a more extensive breakdown, going quatrain by quatrain. However, when I ran the prompt again, ChatGPT reverted to the same basic analysis as Bard, with a few more themes thrown in.
Generally, I find a more detailed explanation of themes more helpful, so ChatGPT’s first iteration is better. But if I were cramming for an exam? You bet I’m taking Bard’s answer because it’s so much shorter to read.
“Write a bio of reporter Emilia David”
I promise this prompt was not due to any level of self-absorption on my part, but people often use conversational AI chatbots to help write a quick resume or biography. I’d hoped that both platforms would at least know that I started writing for The Verge this year.
ChatGPT clearly trawled my website, even going as far as repeating the same verbiage I’d written on my “About Me” page. It also took information from an article written about me before and what I can guess was a cursory look at my author pages in different publications I’ve worked at, including The Verge. It should be noted that The Verge’s parent company, Vox Media, has blocked OpenAI’s web crawler.
Bard, by contrast, failed entirely. It told me it did “not have enough information about that person to help with your request.” I’m not sure if I should be offended or confused as to why the model did not pull from my internet presence as a reporter for several years.
“Draw a picture of a magnificent horse frolicking in a field of daisies at sunrise”
Since ChatGPT has integrated text-to-image capabilities, it generated a photorealistic image of a “magnificent horse frolicking in a field at sunrise.” Very calming.
Although the Gemini Pro model offers multimodal prompting, that feature is not yet available on Bard. So it’s not surprising that it told me that it could not fulfill my prompt. However, I did try a different prompt, and well…
Can you draw me the sun?
But thank you, ChatGPT, for drawing a fairly ominous, radiant sun.
“What are the lyrics to Taylor Swift’s ‘Ivy’?”
Bard refused to answer the question, saying it had no information about that person. I’m guessing the model believed “Ivy” was a person rather than a song since, when prompted for Swift’s bio, it did so without question. (It did falsely attribute “See You Again,” the Wiz Khalifa song featuring Charlie Puth, to Swift, however, and it got the release year wrong for her album rerecordings.)
I asked Bard the same question a few days later, and this time, it gave me wonderfully wrong lyrics that somehow evoke the same imagery as the song. This is not the chorus of “Ivy,” but you could have fooled me:
I’m your ivy, twining ‘round your evergreen
You’re my anchor, holding me safe from the keen
Bitter wind that chills my bones to the marrow
But you, you’re my shelter from the storm
ChatGPT, on the other hand, took the prompt and ran with it. I only asked for lyrics, but alongside them, it gave me a dissertation on the song. “The lyrics showcase Swift’s poetic and evocative writing style, blending imagery and emotion in a way that has become a hallmark of her songwriting,” it effused.
Okay, it included an outro that isn’t present in the song, but otherwise, I was impressed — and surprised. Services that reprint lyrics tend to cut deals with licensing houses and highlight copyright information when they deliver them, something ChatGPT didn’t do. Universal Music Group, which incidentally owns Swift’s record label, sued rival AI company Anthropic and its chatbot Claude 2 for allegedly distributing copyrighted lyrics without licensing. Normally, ChatGPT cuts off lyrics and says it can’t display the full song or sometimes refers to copyright protection limitations. I reached out to OpenAI about this, and the company said it is investigating how the chatbot managed to bypass its content policies.
“What is better, an iPhone 15 or a Pixel 8?”
At first glance, ChatGPT gave what seemed like a fair comparison between the two phones, detailing what makes each model different. It said Apple “typically uses high-quality hardware, focusing on performance and durability” and that its camera is likely to have excellent quality with low-light performance improvements. It said Pixel phones “often include the latest hardware innovations and has features like Night Sight.” But it offered nothing on important details like pricing, camera resolution, and other specs. There was no helpful information on these new phones specifically, just the overall iPhone and Pixel lineups.
Meanwhile, Bard (owned, I may remind you, by the Pixel 8’s creator) couldn’t answer the question at all. It claimed the iPhone 15 is not officially out yet, likely due to limitations in its training data. GPT-4’s data cutoff is 2021 (GPT-4 Turbo, the latest version, is trained on information up to April 2023), and we don’t know the cutoff for Gemini Pro.
But both Bard and ChatGPT Plus are capable of searching the live web for real-time information that would make clear the iPhone 15 exists — so I’m not sure why neither of them seemed to do it.
“What is the latest in the Epic v. Google case?”
To more directly test each chatbot’s real-time news capabilities, I asked both Bard and ChatGPT to tell me what happened in the recent antitrust case between Epic and Google. Both were able to answer with the latest information: that Epic won the case.
ChatGPT chose to write two paragraphs summarizing Epic’s win and linked to articles from Reuters, WBUR, and Digital Trends. It wrote that the jury’s decision may have implications for Google, but pointed out the possibility of a lengthy appeals process.
Bard broke the decision down to the key issues of why the jury found Google guilty, saying Google had maintained an illegal monopoly through the Play Store, unfairly stifled competition, and used anticompetitive tactics. It also noted the next steps Google could take and the wider implications of Epic’s win to the app store landscape. But while Bard may have had facts correct, its references weren’t so solid. It linked to a Verge article explaining the trial but labeled it as an Epic Games press release, while a TechCrunch story was labeled as coming from Reuters.
“What should I do as an asthmatic?”
“Dr. Google” may have become a joke, but people (cough, me, cough) do often turn to search engines for medical advice. So I asked for some guidelines to follow as an asthma sufferer.
Both ChatGPT and Bard told me it was important to follow my asthma action plan that my doctor and I developed, to take my medication, identify triggers and allergies, monitor my symptoms, and consider lifestyle changes like losing weight. ChatGPT also recommended I get flu shots.
I’ve heard this all from my doctor
Only Bard, however, had a disclaimer that it is not a doctor and cannot provide medical advice. It explained that the guidelines it gave me were from the Mayo Clinic and the American Lung Association, both of which it linked to. ChatGPT did not cite any sources.
In total, what does this all show? Bard is largely capable of going toe-to-toe with ChatGPT Plus, although it can’t offer some features like image generation yet. However, Bard refused to answer more prompts, citing either an inability to produce photographic results yet or the limitations of its red teaming. And Bard can be slightly slower to respond than ChatGPT Plus — but for the price of free, that’s not a deal-breaker.