Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul;14(7):663-9.
doi: 10.1038/gim.2012.7.

Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

Affiliations
Free PMC article

Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

Byron C Wallace et al. Genet Med. 2012 Jul.
Free PMC article

Abstract

Purpose: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews.

Methods: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as "relevant" or "irrelevant" using human screening as the gold standard.

Results: Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases.

Conclusion: Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Outline of the classification method. The title, abstract, and MeSH term components of citation documents are encoded as series of 0's and 1's (i.e., as separate “bag-of-words” representations; see Methods section). We used an ensemble of 11 base classifiers (squares) comprising three Support Vector Machines (SVMs, circles), one per encoded component. Open circles and black circles stand for SVMs that classify their encoded components as relevant and irrelevant, respectively. If at least one of the SVMs suggests that the citation is relevant, the corresponding base classifier casts a relevant vote (white squares); otherwise, it casts a vote for irrelevant (black squares). The overall disposition is given according to the majority vote of the ensemble of 11 base classifiers (here, relevant with 7 vs. 4 votes)—this is called “bagging”. The proportion of votes for the “winning” disposition is a proxy for the confidence of the classifier in its ultimate vote (here 7/11, or 0.64). MeSH, medical subject heading.

References

    1. Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up. PLoS Med. 2010;7:e1000326. - PMC - PubMed
    1. Koch GG.No improvement—still less than half of the Cochrane reviews are up to dateXIV Cochrane Colloquium (Ireland)2006
    1. Garritty C, Tsertsvadze A, Tricco AC, Sampson M, Moher D. Updating systematic reviews: an international survey. PLoS ONE. 2010;5:e9914. - PMC - PubMed
    1. Frodsham AJ, Higgins JP. Online genetic databases informing human genome epidemiology. BMC Med Res Methodol. 2007;7:31. - PMC - PubMed
    1. Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA. 1999;282:634–635. - PubMed

Publication types

LinkOut - more resources