Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 26;6(1):75.
doi: 10.1038/s41746-023-00819-6.

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

Affiliations

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

Catherine A Gao et al. NPJ Digit Med. .

Abstract

Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. We gathered fifth research abstracts from five high-impact factor medical journals and asked ChatGPT to generate research abstracts based on their titles and journals. Most generated abstracts were detected using an AI output detector, 'GPT-2 Output Detector', with % 'fake' scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% 'fake' [12.73%, 99.98%] compared with median 0.02% [IQR 0.02%, 0.09%] for the original abstracts. The AUROC of the AI output detector was 0.94. Generated abstracts scored lower than original abstracts when run through a plagiarism detector website and iThenticate (higher scores meaning more matching text found). When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, though abstracts they suspected were generated were vaguer and more formulaic. ChatGPT writes believable scientific abstracts, though with completely generated data. Depending on publisher-specific guidelines, AI output detectors may serve as an editorial tool to help maintain scientific standards. The boundaries of ethical and acceptable use of large language models to help scientific writing are still being discussed, and different journals and conferences are adopting varying policies.

PubMed Disclaimer

Conflict of interest statement

A.T.P. reports no competing interests for this work, and reports personal fees from Prelude Therapeutics Advisory Board, Elevar Advisory Board, AbbVie consulting, Ayala Advisory Board, and Privo Therapeutics, all outside of submitted work. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Generated abstracts have a similar patient cohort size as original abstracts.
Cohort sizes from original abstracts (x-axis) and generated abstracts (y-axis) plotted on a logarithmic 10 scale.
Fig. 2
Fig. 2. Many generated abstracts can be detected using an AI output detector.
a AI detection scores as [% ‘fake’] per GPT-2 Output Detector for original abstracts and generated abstracts. Higher score indicates more likely to be generated by AI. b The AI output detector ROC curve for discriminating between original and generated abstracts, with AUROC of 0.94.
Fig. 3
Fig. 3. Generated abstracts score lower than original abstracts on plagiarism detectors.
a Plagiarism scores from plagiarism detector website, with higher % ‘plagiarized’ score indicating more matching text was found. b iThenticate Similarity Index for original abstracts and generated abstracts [%], with higher value meaning more similar text was found.
Fig. 4
Fig. 4. Reviewers use criteria different than the AI output detector for flagging abstracts as either generated or original.
The AI detection scores for generated abstracts were not significantly different (p = 0.45) between abstracts that human reviewers identified as generated, and those that they failed to identify as generated.

Comment in

References

    1. OpenAI. ChatGPT: Optimizing language models for dialogue. OpenAIhttps://openai.com/blog/chatgpt/ (2022).
    1. Shankland, S. ChatGPT: Why everyone is obsessed this mind-blowing AI chatbot. CNEThttps://www.cnet.com/tech/computing/chatgpt-why-everyone-is-obsessed-thi... (2022).
    1. Agomuoh, F. ChatGPT: how to use the viral AI chatbot that took the world by storm. Digital Trendshttps://www.digitaltrends.com/computing/how-to-use-openai-chatgpt-text-g... (2022).
    1. Hern, A. AI bot ChatGPT stuns academics with essay-writing skills and usability. The Guardian (2022).
    1. Haque, M. U., Dharmadasa, I., Sworna, Z. T., Rajapakse, R. N. & Ahmad, H. “I think this is the most disruptive technology”: exploring sentiments of ChatGPT early adopters using Twitter data. https://arxiv.org/abs/2212.05856 (2022).