Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;22(3):707-17.
doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Affiliations

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine

Aaron M Cohen et al. J Am Med Inform Assoc. 2015 May.

Abstract

Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT.

Materials and methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article.

Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well.

Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified.

Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Keywords: Evidence-Based Medicine; Information Retrieval; Natural Language Processing; Randomized Controlled Trials as Topic; Support Vector Machines; Systematic Reviews.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
This graph shows the correspondence between the predicted RCT confidence centered at each 0.10 width range between 0.0 and 1.0, and the prevalence of articles determined to describe RCTs by manual review. Samples were chosen randomly across four searches corresponding to Cochrane topics where none of the chosen articles were tagged in MEDLINE with the “Randomized Controlled Trial” publication type. It can be seen that estimated prevalence is slightly below the predicted confidence. This is likely due to two reasons. First, in order to keep the manual review task modest, the binning that was used to group the confidence ranges, and the number of samples in each bin are somewhat coarse. Second, and more importantly, the manually reviewed samples do not represent a uniform random sample from MEDLINE. The samples were specifically chosen to not have the MEDLINE RCT_PT. Since all of these had been previously reviewed by MEDLINE annotators and not tagged with this publication type, it is reasonable to expect that these articles would have somewhat less than predicted chance of being RCTs. Still, for the articles with high predicted confidence, a large fraction of the articles were designated as RCTs by the reviewer.

Similar articles

Cited by

References

    1. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312(7023):71–72. - PMC - PubMed
    1. Haynes RB. What kind of evidence is it that evidence-based medicine advocates want health care providers and consumers to pay attention to? BMC Health Serv Res. 2002;2(1):3. - PMC - PubMed
    1. Wieland LS, Robinson KA, Dickersin K. Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records. BMJ. 2012;344:d7501. - PubMed
    1. Edinger T, Cohen AM. A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review. AMIA Annu Symp Proc. 2013;2013:379–387. - PMC - PubMed
    1. Cohen AM, Adams CE, Davis JM, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium November, 2010; Arlington, Virginia USA. 2010:376–380.

Publication types