Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Mar-Apr;12(2):207-16.
doi: 10.1197/jamia.M1641. Epub 2004 Nov 23.

Text categorization models for high-quality article retrieval in internal medicine

Affiliations
Comparative Study

Text categorization models for high-quality article retrieval in internal medicine

Yindalon Aphinyanaphongs et al. J Am Med Inform Assoc. 2005 Mar-Apr.

Abstract

OBJECTIVE Finding the best scientific evidence that applies to a patient problem is becoming exceedingly difficult due to the exponential growth of medical publications. The objective of this study was to apply machine learning techniques to automatically identify high-quality, content-specific articles for one time period in internal medicine and compare their performance with previous Boolean-based PubMed clinical query filters of Haynes et al. DESIGN The selection criteria of the ACP Journal Club for articles in internal medicine were the basis for identifying high-quality articles in the areas of etiology, prognosis, diagnosis, and treatment. Naive Bayes, a specialized AdaBoost algorithm, and linear and polynomial support vector machines were applied to identify these articles. MEASUREMENTS The machine learning models were compared in each category with each other and with the clinical query filters using area under the receiver operating characteristic curves, 11-point average recall precision, and a sensitivity/specificity match method. RESULTS In most categories, the data-induced models have better or comparable sensitivity, specificity, and precision than the clinical query filters. The polynomial support vector machine models perform the best among all learning methods in ranking the articles as evaluated by area under the receiver operating curve and 11-point average recall precision. CONCLUSION This research shows that, using machine learning methods, it is possible to automatically build models for retrieving high-quality, content-specific articles using inclusion or citation by the ACP Journal Club as a gold standard in a given time period in internal medicine that perform better than the 1994 PubMed clinical query filters.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Receiver operating characteristic (ROC) curves for each category. X is clinical query filter performance at optimized sensitivity (right-most x) and specificity (left-most x). SVM = support vector machine.
Figure 2.
Figure 2.
Eleven-point precision-recall curves compared with optimized sensitivity and specificity clinical query filters. X is clinical query filter (right-most x) optimized for sensitivity and specificity (left-most x). DIAG = diagnosis; ETIO = etiology; PPV = positive predictive value; PROG = prognosis; SVM = support vector machine.
Figure 3.
Figure 3.
Title + abstract (TA) vs. Title + abstract + MeSH + publication types (TAM) performance comparisons. X is clinical query filter performance at optimized sensitivity (right-most x) and specificity (left-most x). ROC = receiver operating characteristic. SVM = support vector machine.

References

    1. Bigby M. Evidence-based medicine in a nutshell. Arch Dermatol. 1998;123:1609–18. - PubMed
    1. Sackett DL, Richardson WS, Rosenberg W, et al. Evidence Based Medicine: How to Practice and Teach EBM. Edinburgh: Churchill Livingstone, 1998.
    1. The Cochrane Colaboration [homepage on the Internet]. [cited 2004 Dec]. Available from: http://www.cochrane.org
    1. Evidence Based Medicine [homepage on the Internet]. [cited 2004 Dec]. Available from: http://ebm.bmjjournals.com
    1. ACP Journal Club [homepage on the Internet]. [cited 2004 Dec]. Available from: http://www.acpjc.org

Publication types