ChatGPT Writes Scientific Abstracts That Can Fool Experts
The release of the AI-powered ChatGPT in late 2022 piqued the interest of millions of users around the world as they used the tool to instantly generate Shakespearean-like sonnets, or lines of programming code.
But ChatGPT’s versatility in producing eerily human-like text has also inspired anxiety in academic circles, with many educators fearing that it might be misused by students looking for an easy way to write convincing, college-level essays to pass off as their own work.
Similarly, that concern is echoed by those who do scientific research, who are concerned that powerful, large language artificial-intelligence (AI) models like ChatGPT may be used to write research papers, some of which might be good enough to fool other scientists and academic institutions.
Unfortunately, as recent research from Northwestern University and the University of Chicago shows, ChatGPT is in fact quite good at writing believable scientific abstracts — so good that they not only fooled human experts, but also traditional plagiarism-detection tools.
A scientific abstract is an overview of a scientific paper that is usually included at the beginning, giving a brief summary of the background information, methods, results and discussion surrounding a study. Abstracts are meant to give the reader a condensed version of the work and are a vital part of any research paper.
In this case, the preprint study involved using ChatGPT to generate 50 medical research abstracts, in addition to abstracts that were written by humans. Each human reviewer was then asked to look over a mix of 25 machine- and human-authored abstracts, in order to give a binary score to indicate whether they thought it was fake or real.
The Real Thing
To make the machine-generated text more plausible, the researchers prompted the chatbot by asking it to generate text based on recent titles taken from well-known scientific journals like JAMA, The New England Journal of Medicine, The BMJ, The Lancet and Nature Medicine. In addition to testing both sets of machine- and human-generated abstracts on human reviewers, the abstracts were also run through a conventional plagiarism detector, and an AI-output detector.
“Our reviewers knew that some of the abstracts they were being given were fake, so they were very suspicious,” said the study’s lead author, Catherine Gao, in a statement.
“This is not someone reading an abstract in the wild. The fact that our reviewers still missed the AI-generated ones 32% of the time means these abstracts are really good. I suspect that if someone just came across one of these generated abstracts, they wouldn’t necessarily be able to identify it as being written by AI.”
In generating the abstracts, ChatGPT was able to get some superficial details right, which helped lend an air of authenticity to the text. For instance, in creating a fake study on hypertension, ChatGPT included details like tens of thousands of patients in the cohort, while a monkeypox study was described as having much fewer subjects. Even though the numbers themselves were fabricated, there was nevertheless enough consistency to deceive human readers.
The machine-generated text also fooled conventional plagiarism-detection tools, which rated the ChatGPT abstracts as having a median originality score of 100%. However, the AI-output detector was able to identify 66% of the AI-generated abstracts — not too bad, but not too great either.
The human reviewers, on the other hand, didn’t do that much better: They were able to pick out only 68% of the fake abstracts, in comparison to correctly identifying 86% of the texts that were written by humans. In fact, the human reviewers incorrectly identified 14% of the genuine abstracts as being fake.
While not foolproof, Gao said that the results suggest AI-detection tools need to become part of the review process. “We found that an AI output detector was pretty good at detecting output from ChatGPT, and suggest that it be included in the scientific editorial process as a screening process, to protect from targeting by organizations such as paper mills that may try to submit purely generated data.”
Though ChatGPT might be used for nefarious ends, the team believes there are also potential advantages to the technology.
“We anticipate that this technology could be used in both an ethical and unethical way. Given its ability to generate abstracts with believable numbers, it could be used to entirely falsify research,” said the team.
“On the other hand, the technology may be used in conjunction with a researcher’s own scientific knowledge as a tool to decrease the burden of writing and formatting. It could be used by scientists publishing in a language that is not their native language, to improve equity.”