ChatGPT as an effective tool for quality evaluation of radiomics research
- PMID: 39406959
- DOI: 10.1007/s00330-024-11122-7
ChatGPT as an effective tool for quality evaluation of radiomics research
Abstract
Objectives: This study aimed to evaluate the effectiveness of ChatGPT-4o in assessing the methodological quality of radiomics research using the radiomics quality score (RQS) compared to human experts.
Methods: Published in European Radiology, European Radiology Experimental, and Insights into Imaging between 2023 and 2024, open-access and peer-reviewed radiomics research articles with creative commons attribution license (CC-BY) were included in this study. Pre-prints from MedRxiv were also included to evaluate potential peer-review bias. Using the RQS, each study was independently assessed twice by ChatGPT-4o and by two radiologists with consensus.
Results: In total, 52 open-access and peer-reviewed articles were included in this study. Both ChatGPT-4o evaluation (average of two readings) and human experts had a median RQS of 14.5 (40.3% percentage score) (p > 0.05). Pairwise comparisons revealed no statistically significant difference between the readings of ChatGPT and human experts (corrected p > 0.05). The intraclass correlation coefficient for intra-rater reliability of ChatGPT-4o was 0.905 (95% CI: 0.840-0.944), and those for inter-rater reliability with human experts for each evaluation of ChatGPT-4o were 0.859 (95% CI: 0.756-0.919) and 0.914 (95% CI: 0.855-0.949), corresponding to good to excellent reliability for all. The evaluation by ChatGPT-4o took less time (2.9-3.5 min per article) compared to human experts (13.9 min per article by one reader). Item-wise reliability analysis showed ChatGPT-4o maintained consistently high reliability across almost all RQS items.
Conclusion: ChatGPT-4o provides reliable and efficient assessments of radiomics research quality. Its evaluations closely align with those of human experts and reduce evaluation time.
Key points: Question Is ChatGPT effective and reliable in evaluating radiomics research quality based on RQS? Findings ChatGPT-4o showed high reliability and efficiency, with evaluations closely matching human experts. It can considerably reduce the time required for radiomics research quality assessment. Clinical relevance ChatGPT-4o offers a quick and reliable automated alternative for evaluating the quality of radiomics research, with the potential to assess radiomics research at a large scale in the future.
Keywords: Artificial intelligence; Large language models; Machine learning; Radiomics; Texture analysis.
© 2024. The Author(s), under exclusive licence to European Society of Radiology.
Conflict of interest statement
Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is Burak Kocak, MD. Conflict of interest: B.K. is on the editorial board of European Radiology (section editor: Imaging Informatics and Artificial Intelligence). He has taken no part in this article’s peer review or selection. The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Non-applicable. Ethical approval: Non-applicable. Study subjects or cohorts overlap: None. Methodology: Experimental
Similar articles
-
Artificial intelligence in radiology examinations: a psychometric comparison of question generation methods.Diagn Interv Radiol. 2025 Jul 21. doi: 10.4274/dir.2025.253407. Online ahead of print. Diagn Interv Radiol. 2025. PMID: 40686400
-
Quality of radiomics research: comprehensive analysis of 1574 unique publications from 89 reviews.Eur Radiol. 2025 Apr;35(4):1980-1992. doi: 10.1007/s00330-024-11057-z. Epub 2024 Sep 6. Eur Radiol. 2025. PMID: 39237770
-
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780. Cochrane Database Syst Rev. 2024. PMID: 39679851 Free PMC article.
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280. Health Technol Assess. 2001. PMID: 11701100
-
Quality appraisal of radiomics-based studies on chondrosarcoma using METhodological RadiomICs Score (METRICS) and Radiomics Quality Score (RQS).Insights Imaging. 2025 Jun 18;16(1):129. doi: 10.1186/s13244-025-02016-3. Insights Imaging. 2025. PMID: 40533701 Free PMC article.
Cited by
-
Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2. Sci Rep. 2025. PMID: 40269046 Free PMC article.
-
Letter to Editor: Pushing large language models for improved radiomics study and research.Eur Radiol. 2025 Jul 18. doi: 10.1007/s00330-025-11863-z. Online ahead of print. Eur Radiol. 2025. PMID: 40679623 No abstract available.
-
Explanation and Elaboration with Examples for METRICS (METRICS-E3): an initiative from the EuSoMII Radiomics Auditing Group.Insights Imaging. 2025 Aug 13;16(1):175. doi: 10.1186/s13244-025-02061-y. Insights Imaging. 2025. PMID: 40802002 Free PMC article.
References
-
- Parekh V, Jacobs MA (2016) Radiomics: a new application from established techniques. Expert Rev Precis Med Drug Dev 1:207–226. https://doi.org/10.1080/23808993.2016.1164013 - DOI - PubMed - PMC
-
- Koçak B, Durmaz EŞ, Ateş E, Kılıçkesmez Ö (2019) Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interv Radiol 25:485–495. https://doi.org/10.5152/dir.2019.19321 - DOI - PubMed - PMC
-
- Kocak B, Baessler B, Cuocolo R et al (2023) Trends and statistics of artificial intelligence and radiomics research in radiology, nuclear medicine, and medical imaging: bibliometric analysis. Eur Radiol 33:7542–7555. https://doi.org/10.1007/s00330-023-09772-0 - DOI - PubMed
-
- Zhong J, Lu J, Zhang G et al (2023) An overview of meta-analyses on radiomics: more evidence is needed to support clinical translation. Insights Imaging 14:111. https://doi.org/10.1186/s13244-023-01437-2 - DOI - PubMed - PMC
-
- Cobo M, Menéndez Fernández-Miranda P, Bastarrika G, Lloret Iglesias L (2023) Enhancing radiomics and deep learning systems through the standardization of medical imaging workflows. Sci Data 10:732. https://doi.org/10.1038/s41597-023-02641-x - DOI - PubMed - PMC
MeSH terms
LinkOut - more resources
Full Text Sources