Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Aaron M Cohen¹, Neil R Smalheiser², Marian S McDonagh³, Clement Yu⁴, Clive E Adams⁵, John M Davis², Philip S Yu⁴

Affiliations

¹ Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA cohenaa@ohsu.edu.
² Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA.
³ Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA.
⁴ Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612 USA.
⁵ Division of Psychiatry, University of Nottingham, Nottingham, UK.

PMID: 25656516
PMCID: PMC4457112
DOI: 10.1093/jamia/ocu025

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Aaron M Cohen et al. J Am Med Inform Assoc. 2015 May.

. 2015 May;22(3):707-17.

doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Authors

Aaron M Cohen¹, Neil R Smalheiser², Marian S McDonagh³, Clement Yu⁴, Clive E Adams⁵, John M Davis², Philip S Yu⁴

Affiliations

¹ Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA cohenaa@ohsu.edu.
² Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA.
³ Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA.
⁴ Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612 USA.
⁵ Division of Psychiatry, University of Nottingham, Nottingham, UK.

PMID: 25656516
PMCID: PMC4457112
DOI: 10.1093/jamia/ocu025

Abstract

Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT.

Materials and methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article.

Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well.

Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified.

Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Keywords: Evidence-Based Medicine; Information Retrieval; Natural Language Processing; Randomized Controlled Trials as Topic; Support Vector Machines; Systematic Reviews.

PubMed Disclaimer

Figures

**Figure 1:**
This graph shows the correspondence between the predicted RCT confidence centered at each 0.10 width range between 0.0 and 1.0, and the prevalence of articles determined to describe RCTs by manual review. Samples were chosen randomly across four searches corresponding to Cochrane topics where none of the chosen articles were tagged in MEDLINE with the “Randomized Controlled Trial” publication type. It can be seen that estimated prevalence is slightly below the predicted confidence. This is likely due to two reasons. First, in order to keep the manual review task modest, the binning that was used to group the confidence ranges, and the number of samples in each bin are somewhat coarse. Second, and more importantly, the manually reviewed samples do not represent a uniform random sample from MEDLINE. The samples were specifically chosen to not have the MEDLINE RCT_PT. Since all of these had been previously reviewed by MEDLINE annotators and not tagged with this publication type, it is reasonable to expect that these articles would have somewhat less than predicted chance of being RCTs. Still, for the articles with high predicted confidence, a large fraction of the articles were designated as RCTs by the reviewer.

See this image and copyright information in PMC

References

1. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312(7023):71–72. - PMC - PubMed
1. Haynes RB. What kind of evidence is it that evidence-based medicine advocates want health care providers and consumers to pay attention to? BMC Health Serv Res. 2002;2(1):3. - PMC - PubMed
1. Wieland LS, Robinson KA, Dickersin K. Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records. BMJ. 2012;344:d7501. - PubMed
1. Edinger T, Cohen AM. A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review. AMIA Annu Symp Proc. 2013;2013:379–387. - PMC - PubMed
1. Cohen AM, Adams CE, Davis JM, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium November, 2010; Arlington, Virginia USA. 2010:376–380.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Affiliations

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials