(NewsNation) — Since the consumer AI app ChatGPT came online, academics have been developing ways to detect students using it to produce term papers and college admissions essays. But nobody was examining how it could be used to cheat on multiple-choice tests — until now.
A professor at Florida State University discovered that, when it comes to multiple-choice tests where one answer is empirically correct and the other three are empirically wrong, ChatGPT often gets it wrong.
ChatGPT “can generate content, but it doesn’t necessarily generate correct content,” Dr. Kenneth Hanson, an associate professor in the FSU Department of Chemistry and Biochemistry, told Spectrum, the alumni magazine of FSU’s College of Arts and Sciences.
“It’s simply an answer generator. It’s trying to look like it knows the answer, and to someone who doesn’t understand the material, it probably does look like a correct answer.”
Hanson, along with machine learning engineer Ben Sorenson, repeatedly tested ChatGPT with the same exams that chemistry students took over five semesters.
The human students followed a general pattern: the good ones answered nearly all the questions correctly, the average students got some of the difficult questions and most easy questions correct, and poor students usually could only answer easy questions correctly.
The app did nearly the opposite. ChatGPT often answered every easier question incorrectly and every hard question correctly. Armed with that information, Hanson and Sorenson were able to detect the use of ChatGPT with near 100% accuracy.
“Our work is a great way to provide supporting evidence when educators might already suspect that cheating may be happening,” Sorenson said. “What we didn’t expect was that the patterns of artificial intelligence would be so easy to identify.”