Geologists have raised concerns about potential Chinese censorship and bias in a chatbot being developed with the backing of the International Union of Geological Sciences (IUGS), one of the world’s largest scientific organisations and a Unesco partner.
The GeoGPT chatbot is aimed at geoscientists and researchers, particularly in the global south, to help them develop their understanding of earth sciences by drawing on swaths of data and research on billions of years of the planet’s history.
It is an initiative from Deep-time Digital Earth (DDE), a largely Chinese-funded programme founded in 2019 to enhance international scientific cooperation and help countries to realise the UN’s sustainable development goals.
Part of the underlying AI for GeoGPT is Qwen, a large language model built by the Chinese tech company Alibaba. One of those who had tested a pre-release version of the chatbot, Prof Paul Cleverley, a geologist and computer scientist, claimed in an article recently published in the Geoscientist, the magazine of the Geological Society, the UK’s professional association for geologists, that GeoGPT had “serious issues around a lack of transparency, state censorship, and potential copyright infringement”.
Responding to the article, DDE representatives Michael Stephenson, Hans Thybo, Chengshan Wang and Ishwaran Natarajan said the chatbot also used Meta’s Llama, another large language model, and that during testing they had not noticed any state censorship, which they said was “unlikely” given that the system was “based entirely in geoscience information”.
The DDE academics said: “Problems with GeoGPT have been largely solved, but the team will be working to improve the system even more. It must be stressed that at present GeoGPT has not been released and is not in the public domain.”
David Giles, a professional geoscientist, said it was “blatantly untrue” that a system based on geoscience data could be free of sensitive information.
Tests on Qwen, part of GeoGPT’s underlying AI, reveal geoscience-related questions can produce answers that appear to be influenced by narratives set by the Chinese Communist party.
For example, when asked how many people have died in a mining operation in Ghana run by the Shaanxi Mining Company, Qwen says: “I’m unable to provide current or specific information about events, including mining accidents, as my knowledge is based on data up until 2021 and I don’t have real-time access to news updates.”
The same question posed to ChatGPT, the chatbot developed by the US company OpenAI, produces the answer: “The Shaanxi Mining Company in Ghana has experienced multiple fatal incidents, resulting in a total of 61 deaths since 2013. This includes a significant explosion in January 2019 that alone claimed 16 lives.”
It is not clear what kind of answer GeoGPT, which is still in development, would give to this question.
Dr Natarajan Ishwaran, the head of international relations for DDE, said: “The team building GeoGPT has full independence. We can assure you that GeoGPT – currently in an exploratory phase and not yet open to the public – will not be affected by any state censorship.”
He added that users would be able to choose between using Alibaba’s Qwen or Meta’s Llama as the model for GeoGPT.
Geoscientific research and data include commercially and strategically valuable information about deposits of natural resources such as lithium, which are vital for the green transition.
Giles said there was a risk that a Chinese-developed platform could “filter” information to withhold content that was useful for “mineral reconnaissance”.
He added: “China is very aggressively looking for minerals across the globe. There is a strategic advantage and an economic advantage in looking for mineral reserves.”
An article published in 2020 by Chen Jun, an academic at the Chinese Academy of Sciences, said DDE, the scientific programme that created GeoGPT, would “help enhance China’s detection and security capabilities in global resources and energy”.
Stephenson, Thybo, Wang and Natarajan, from DDE, said the 2020 article aimed “to encourage Chinese scientists to get involved in international science programmes” and was “purely the opinion of the author”, not of DDE or the Chinese Academy of Sciences.
Mohammad Hoque, a senior lecturer in hydrogeology and environmental geoscience at the University of Portsmouth, said “one danger” of using a Chinese language model for academic research was that “there will be some bias, because they have to obey local laws”.
GeoGPT’s terms of use state that prompting the chatbot to generate content that “undermines national security” and “incites subversion of state power” is prohibited. The terms of use also state that it is governed by the laws of China.
Hoque said GeoGPT had a greater obligation of transparency because it was developed under the auspices of an international research collaboration. “The most important thing would be to know what data they use to fine-tune and train [GeoGPT]. We have an expectation to know under IUGS.”
John Ludden, the president of the IUGS, said the GeoGPT database would be made public “only if the IUGS is satisfied that the appropriate governance is in place”.
Ishwaran said when GeoGPT opened to the public its training database would be made available “to those who wish to have it”.
Geologists interviewed by the Guardian said the extent of DDE’s links to China were not widely known among professionals. According to a planning document published in 2021, the multimillion-pound project is “almost 99%” funded by sources in China.
The programme is part of the IUGS, an international NGO representing more than 1 million geoscientists in 121 countries, including the UK’s Geological Society. Its secretariat is based in Beijing and receives “tremendous” financial and logistical support from the Chinese government, according to the organisation’s 2023 annual report.
Ludden said: “The best thing for science is to be open and share data. DDE does this for geological data if openly available [and] will lead to inward investment in any nation … [and] discoveries in research.”