Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014:2014:253128.
doi: 10.1155/2014/253128. Epub 2014 Apr 16.

A knowledge-driven approach to extract disease-related biomarkers from the literature

Affiliations
Review

A knowledge-driven approach to extract disease-related biomarkers from the literature

À Bravo et al. Biomed Res Int. 2014.

Abstract

The biomedical literature represents a rich source of biomarker information. However, both the size of literature databases and their lack of standardization hamper the automatic exploitation of the information contained in these resources. Text mining approaches have proven to be useful for the exploitation of information contained in the scientific publications. Here, we show that a knowledge-driven text mining approach can exploit a large literature database to extract a dataset of biomarkers related to diseases covering all therapeutic areas. Our methodology takes advantage of the annotation of MEDLINE publications pertaining to biomarkers with MeSH terms, narrowing the search to specific publications and, therefore, minimizing the false positive ratio. It is based on a dictionary-based named entity recognition system and a relation extraction module. The application of this methodology resulted in the identification of 131,012 disease-biomarker associations between 2,803 genes and 2,751 diseases, and represents a valuable knowledge base for those interested in disease-related biomarkers. Additionally, we present a bibliometric analysis of the journals reporting biomarker related information during the last 40 years.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Text mining workflow.
Figure 2
Figure 2
An example of the variability in terminology for genes depending on the primary sources.
Figure 3
Figure 3
Number of publications (bars) and number of journals (line) by year.
Figure 4
Figure 4
The top 10 journals sorted by unique disease-biomarker cooccurrences identified.
Figure 5
Figure 5
Associations analysis. (a) Boxplot showing the score versus number of publications supporting each disease-biomarker association. (b) Distribution of associations based on the number of publications that support each association. The fraction of the associations that were reported in the last three years is highlighted as dark grey bars.
Figure 6
Figure 6
Distribution of the number of associated biomarkers (for diseases, (a)) and diseases (for biomarkers, (b)). Gene symbols from HGNC are used for the biomarkers.

References

    1. Atkinson AJ, Colburn WA, deGruttola VG, et al. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clinical Pharmacology & Therapeutics. 2001;69(3):89–95. - PubMed
    1. Guidance for Industry-E15 Definitions for Genomic Biomarkers, Pharmacogenomics, Pharmacogenetics, Genomic Data and Sample Coding Categories, http://www.fda.gov/downloads/RegulatoryInformation/Guidances/ucm129296.pdf.
    1. Anderson DC, Kodukula K. Biomarkers in pharmacology and drug discovery. Biochemical Pharmacology. 2014;87(1):172–188. - PubMed
    1. Frank R, Hargreaves R. Clinical biomarkers in drug discovery and development. Nature Reviews Drug Discovery. 2003;2(7):566–580. - PubMed
    1. Dancey JE, Dobbin KK, Groshen S, et al. Guidelines for the development and incorporation of biomarker studies in early clinical trials of novel agents. Clinical Cancer Research. 2010;16(6):1745–1755. - PubMed

Publication types