ChatGPT as an effective tool for quality evaluation of radiomics research

Ismail Mese¹, Burak Kocak²

Affiliations

¹ Department of Radiology, Erenkoy Mental Health and Neurology Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.
² Department of Radiology, Basaksehir Cam and Sakura City Hospital, University of Health Sciences, Istanbul, Turkey. drburakkocak@gmail.com.

PMID: 39406959
DOI: 10.1007/s00330-024-11122-7

ChatGPT as an effective tool for quality evaluation of radiomics research

Ismail Mese et al. Eur Radiol. 2025 Apr.

. 2025 Apr;35(4):2030-2042.

doi: 10.1007/s00330-024-11122-7. Epub 2024 Oct 15.

Authors

Ismail Mese¹, Burak Kocak²

Affiliations

¹ Department of Radiology, Erenkoy Mental Health and Neurology Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.
² Department of Radiology, Basaksehir Cam and Sakura City Hospital, University of Health Sciences, Istanbul, Turkey. drburakkocak@gmail.com.

PMID: 39406959
DOI: 10.1007/s00330-024-11122-7

Abstract

Objectives: This study aimed to evaluate the effectiveness of ChatGPT-4o in assessing the methodological quality of radiomics research using the radiomics quality score (RQS) compared to human experts.

Methods: Published in European Radiology, European Radiology Experimental, and Insights into Imaging between 2023 and 2024, open-access and peer-reviewed radiomics research articles with creative commons attribution license (CC-BY) were included in this study. Pre-prints from MedRxiv were also included to evaluate potential peer-review bias. Using the RQS, each study was independently assessed twice by ChatGPT-4o and by two radiologists with consensus.

Results: In total, 52 open-access and peer-reviewed articles were included in this study. Both ChatGPT-4o evaluation (average of two readings) and human experts had a median RQS of 14.5 (40.3% percentage score) (p > 0.05). Pairwise comparisons revealed no statistically significant difference between the readings of ChatGPT and human experts (corrected p > 0.05). The intraclass correlation coefficient for intra-rater reliability of ChatGPT-4o was 0.905 (95% CI: 0.840-0.944), and those for inter-rater reliability with human experts for each evaluation of ChatGPT-4o were 0.859 (95% CI: 0.756-0.919) and 0.914 (95% CI: 0.855-0.949), corresponding to good to excellent reliability for all. The evaluation by ChatGPT-4o took less time (2.9-3.5 min per article) compared to human experts (13.9 min per article by one reader). Item-wise reliability analysis showed ChatGPT-4o maintained consistently high reliability across almost all RQS items.

Conclusion: ChatGPT-4o provides reliable and efficient assessments of radiomics research quality. Its evaluations closely align with those of human experts and reduce evaluation time.

Key points: Question Is ChatGPT effective and reliable in evaluating radiomics research quality based on RQS? Findings ChatGPT-4o showed high reliability and efficiency, with evaluations closely matching human experts. It can considerably reduce the time required for radiomics research quality assessment. Clinical relevance ChatGPT-4o offers a quick and reliable automated alternative for evaluating the quality of radiomics research, with the potential to assess radiomics research at a large scale in the future.

Keywords: Artificial intelligence; Large language models; Machine learning; Radiomics; Texture analysis.

PubMed Disclaimer

Conflict of interest statement

Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is Burak Kocak, MD. Conflict of interest: B.K. is on the editorial board of European Radiology (section editor: Imaging Informatics and Artificial Intelligence). He has taken no part in this article’s peer review or selection. The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Non-applicable. Ethical approval: Non-applicable. Study subjects or cohorts overlap: None. Methodology: Experimental

References

1. Parekh V, Jacobs MA (2016) Radiomics: a new application from established techniques. Expert Rev Precis Med Drug Dev 1:207–226. https://doi.org/10.1080/23808993.2016.1164013 - DOI - PubMed - PMC
1. Koçak B, Durmaz EŞ, Ateş E, Kılıçkesmez Ö (2019) Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interv Radiol 25:485–495. https://doi.org/10.5152/dir.2019.19321 - DOI - PubMed - PMC
1. Kocak B, Baessler B, Cuocolo R et al (2023) Trends and statistics of artificial intelligence and radiomics research in radiology, nuclear medicine, and medical imaging: bibliometric analysis. Eur Radiol 33:7542–7555. https://doi.org/10.1007/s00330-023-09772-0 - DOI - PubMed
1. Zhong J, Lu J, Zhang G et al (2023) An overview of meta-analyses on radiomics: more evidence is needed to support clinical translation. Insights Imaging 14:111. https://doi.org/10.1186/s13244-023-01437-2 - DOI - PubMed - PMC
1. Cobo M, Menéndez Fernández-Miranda P, Bastarrika G, Lloret Iglesias L (2023) Enhancing radiomics and deep learning systems through the standardization of medical imaging workflows. Sci Data 10:732. https://doi.org/10.1038/s41597-023-02641-x - DOI - PubMed - PMC

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChatGPT as an effective tool for quality evaluation of radiomics research

Affiliations

ChatGPT as an effective tool for quality evaluation of radiomics research

Authors

Affiliations

Abstract

Conflict of interest statement

References

MeSH terms

LinkOut - more resources

Full Text Sources