Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;35(4):2030-2042.
doi: 10.1007/s00330-024-11122-7. Epub 2024 Oct 15.

ChatGPT as an effective tool for quality evaluation of radiomics research

Affiliations

ChatGPT as an effective tool for quality evaluation of radiomics research

Ismail Mese et al. Eur Radiol. 2025 Apr.

Abstract

Objectives: This study aimed to evaluate the effectiveness of ChatGPT-4o in assessing the methodological quality of radiomics research using the radiomics quality score (RQS) compared to human experts.

Methods: Published in European Radiology, European Radiology Experimental, and Insights into Imaging between 2023 and 2024, open-access and peer-reviewed radiomics research articles with creative commons attribution license (CC-BY) were included in this study. Pre-prints from MedRxiv were also included to evaluate potential peer-review bias. Using the RQS, each study was independently assessed twice by ChatGPT-4o and by two radiologists with consensus.

Results: In total, 52 open-access and peer-reviewed articles were included in this study. Both ChatGPT-4o evaluation (average of two readings) and human experts had a median RQS of 14.5 (40.3% percentage score) (p > 0.05). Pairwise comparisons revealed no statistically significant difference between the readings of ChatGPT and human experts (corrected p > 0.05). The intraclass correlation coefficient for intra-rater reliability of ChatGPT-4o was 0.905 (95% CI: 0.840-0.944), and those for inter-rater reliability with human experts for each evaluation of ChatGPT-4o were 0.859 (95% CI: 0.756-0.919) and 0.914 (95% CI: 0.855-0.949), corresponding to good to excellent reliability for all. The evaluation by ChatGPT-4o took less time (2.9-3.5 min per article) compared to human experts (13.9 min per article by one reader). Item-wise reliability analysis showed ChatGPT-4o maintained consistently high reliability across almost all RQS items.

Conclusion: ChatGPT-4o provides reliable and efficient assessments of radiomics research quality. Its evaluations closely align with those of human experts and reduce evaluation time.

Key points: Question Is ChatGPT effective and reliable in evaluating radiomics research quality based on RQS? Findings ChatGPT-4o showed high reliability and efficiency, with evaluations closely matching human experts. It can considerably reduce the time required for radiomics research quality assessment. Clinical relevance ChatGPT-4o offers a quick and reliable automated alternative for evaluating the quality of radiomics research, with the potential to assess radiomics research at a large scale in the future.

Keywords: Artificial intelligence; Large language models; Machine learning; Radiomics; Texture analysis.

PubMed Disclaimer

Conflict of interest statement

Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is Burak Kocak, MD. Conflict of interest: B.K. is on the editorial board of European Radiology (section editor: Imaging Informatics and Artificial Intelligence). He has taken no part in this article’s peer review or selection. The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Non-applicable. Ethical approval: Non-applicable. Study subjects or cohorts overlap: None. Methodology: Experimental

Similar articles

Cited by

References

    1. Parekh V, Jacobs MA (2016) Radiomics: a new application from established techniques. Expert Rev Precis Med Drug Dev 1:207–226. https://doi.org/10.1080/23808993.2016.1164013 - DOI - PubMed - PMC
    1. Koçak B, Durmaz EŞ, Ateş E, Kılıçkesmez Ö (2019) Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interv Radiol 25:485–495. https://doi.org/10.5152/dir.2019.19321 - DOI - PubMed - PMC
    1. Kocak B, Baessler B, Cuocolo R et al (2023) Trends and statistics of artificial intelligence and radiomics research in radiology, nuclear medicine, and medical imaging: bibliometric analysis. Eur Radiol 33:7542–7555. https://doi.org/10.1007/s00330-023-09772-0 - DOI - PubMed
    1. Zhong J, Lu J, Zhang G et al (2023) An overview of meta-analyses on radiomics: more evidence is needed to support clinical translation. Insights Imaging 14:111. https://doi.org/10.1186/s13244-023-01437-2 - DOI - PubMed - PMC
    1. Cobo M, Menéndez Fernández-Miranda P, Bastarrika G, Lloret Iglesias L (2023) Enhancing radiomics and deep learning systems through the standardization of medical imaging workflows. Sci Data 10:732. https://doi.org/10.1038/s41597-023-02641-x - DOI - PubMed - PMC

LinkOut - more resources