Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar-Apr;12(2):121-9.
doi: 10.1197/jamia.M1640. Epub 2004 Nov 23.

A statistical approach to scanning the biomedical literature for pharmacogenetics knowledge

Affiliations

A statistical approach to scanning the biomedical literature for pharmacogenetics knowledge

Daniel L Rubin et al. J Am Med Inform Assoc. 2005 Mar-Apr.

Erratum in

  • J Am Med Inform Assoc. 2005 May-Jun;12(3):364

Abstract

Objective: Biomedical databases summarize current scientific knowledge, but they generally require years of laborious curation effort to build, focusing on identifying pertinent literature and data in the voluminous biomedical literature. It is difficult to manually extract useful information embedded in the large volumes of literature, and automated intelligent text analysis tools are becoming increasingly essential to assist in these curation activities. The goal of the authors was to develop an automated method to identify articles in Medline citations that contain pharmacogenetics data pertaining to gene-drug relationships.

Design: The authors built and evaluated several candidate statistical models that characterize pharmacogenetics articles in terms of word usage and the profile of Medical Subject Headings (MeSH) used in those articles. The best-performing model was used to scan the entire Medline article database (11 million articles) to identify candidate pharmacogenetics articles.

Results: A sampling of the articles identified from scanning Medline was reviewed by a pharmacologist to assess the precision of the method. The authors' approach identified 4,892 pharmacogenetics articles in the literature with 92% precision. Their automated method took a fraction of the time to acquire these articles compared with the time expected to be taken to accumulate them manually. The authors have built a Web resource (http://pharmdemo.stanford.edu/pharmdb/main.spy) to provide access to their results.

Conclusion: A statistical classification approach can screen the primary literature to pharmacogenetics articles with high precision. Such methods may assist curators in acquiring pertinent literature in building biomedical databases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Medical Subject Headings (MeSH terms) associated with an article. Some MeSH terms are designated “supplementary headings” and identified as such in separate MeSH documentation. Subheadings are additional terms that can be associated with MeSH headings, separated from the main heading by a slash. Major topics (main heading or subheading) are indicated with an asterisk.
Figure 2.
Figure 2.
Diagram of the flow of experiments. A training corpus of labeled articles was used to evaluate different classification methods. The two best methods were subsequently tested on an unlabeled subset of Medline in conjunction with gene–drug filtering to select the best method to be used in scanning all of Medline. The results of applying this method to all of Medline were then validated manually.
Figure 3.
Figure 3.
Web resource storing gene–drug relationships and pharmacogenetics literature supporting these relationships. The number of articles supporting each relationship can be a useful indicator of the strength of the association.

References

    1. Zhang MQ. Statistical features of human exons and their flanking regions. Hum Mol Genet. 1998;7(5):919–32. - PubMed
    1. Flockhart DA. Drug Interaction Database. Available at: http://medicine.iupui.edu/flockhart/. Accessed January 4, 2005.
    1. Rubin DL, Woon M, Carillo M, et al. PharmGKB: A Resource to Link Genotype and Phenotype in Pharmacogenetics. In: Medinfo, 2004. San Francisco, CA; 2004.
    1. Klein TE, Chang JT, Cho MK, et al. Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge Base. Pharmacogenomics J. 2001;1(3):167–70. - PubMed
    1. Rubin DL, Carillo M, Woon M, Conroy J, Klein TE, Altman RB. A Resource to Acquire and Summarize Pharmacogenetics Knowledge in the Literature. In: Medinfo, 2004; San Francisco, CA; 2004. - PubMed

Publication types

MeSH terms