Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine
- PMID: 25656516
- PMCID: PMC4457112
- DOI: 10.1093/jamia/ocu025
Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine
Abstract
Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT.
Materials and methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article.
Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well.
Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified.
Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.
Keywords: Evidence-Based Medicine; Information Retrieval; Natural Language Processing; Randomized Controlled Trials as Topic; Support Vector Machines; Systematic Reviews.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Figures

Similar articles
-
A quantitative model for linking two disparate sets of articles in MEDLINE.Bioinformatics. 2007 Jul 1;23(13):1658-65. doi: 10.1093/bioinformatics/btm161. Epub 2007 Apr 26. Bioinformatics. 2007. PMID: 17463015
-
A probabilistic automated tagger to identify human-related publications.Database (Oxford). 2018 Jan 1;2018:1-8. doi: 10.1093/database/bay079. Database (Oxford). 2018. PMID: 30184195 Free PMC article.
-
Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.J Am Med Inform Assoc. 2017 Nov 1;24(6):1165-1168. doi: 10.1093/jamia/ocx053. J Am Med Inform Assoc. 2017. PMID: 28541493 Free PMC article.
-
The pathway to RCTs: how many roads are there? Examining the homogeneity of RCT justification.Trials. 2017 Feb 2;18(1):51. doi: 10.1186/s13063-017-1804-z. Trials. 2017. PMID: 28148278 Free PMC article. Review.
-
Iterative guided machine learning-assisted systematic literature reviews: a diabetes case study.Syst Rev. 2021 Apr 2;10(1):97. doi: 10.1186/s13643-021-01640-6. Syst Rev. 2021. PMID: 33810798 Free PMC article.
Cited by
-
Automation of Article Selection Process in Systematic Reviews Through Artificial Neural Network Modeling and Machine Learning: Protocol for an Article Selection Model.JMIR Res Protoc. 2021 Jun 15;10(6):e26448. doi: 10.2196/26448. JMIR Res Protoc. 2021. PMID: 34128820 Free PMC article.
-
Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.Syst Rev. 2024 Jul 8;13(1):174. doi: 10.1186/s13643-024-02592-3. Syst Rev. 2024. PMID: 38978132 Free PMC article.
-
Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR).Syst Rev. 2019 Feb 20;8(1):57. doi: 10.1186/s13643-019-0975-y. Syst Rev. 2019. PMID: 30786933 Free PMC article.
-
Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews.JAMIA Open. 2022 Mar 30;5(1):ooac015. doi: 10.1093/jamiaopen/ooac015. eCollection 2022 Apr. JAMIA Open. 2022. PMID: 35571360 Free PMC article.
-
Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach.PeerJ Comput Sci. 2024 Mar 20;10:e1940. doi: 10.7717/peerj-cs.1940. eCollection 2024. PeerJ Comput Sci. 2024. PMID: 38660183 Free PMC article.
References
-
- Wieland LS, Robinson KA, Dickersin K. Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records. BMJ. 2012;344:d7501. - PubMed
-
- Cohen AM, Adams CE, Davis JM, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium November, 2010; Arlington, Virginia USA. 2010:376–380.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials