Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 23:1:4.
doi: 10.1186/2193-1801-1-4. eCollection 2012.

The age-phenome database

Affiliations

The age-phenome database

Nophar Geifman et al. Springerplus. .

Abstract

Data linking specific ages or age ranges with disease are abundant in biomedical literature. However, these data are organized such that searching for age-phenotype relationships is difficult. Recently, we described the Age-Phenome Knowledge-base (APK), a computational platform for storage and retrieval of information concerning age-related phenotypic patterns. Here, we report that data derived from over 1.5 million human-related PubMed abstracts have been added to APK. Using a text-mining pipeline, 35,683 entries which describe relationships between age and phenotype (such as disease) have been introduced into the database. Comparing the results to those obtained by a human reader reveals that the overall accuracy of these entries is estimated to exceed 80%. The usefulness of these data for obtaining new insight regarding age-disease relationships is demonstrated using clustering analysis, which is shown to capture obvious, as well as potentially interesting relationships between diseases. In addition, a new tool for browsing and searching the APK database is presented. We thus present a unique resource and a new framework for studying age-disease relationships and other phenotypic processes.

Keywords: Age; Knowledgebase; Phenotype; Text-minig.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data-mining pipeline. The process of mining PubMed abstracts can be divided into 4 main steps: 1) Finding age-related abstracts, 2) determining the age-phenotype relationship type for those age-related abstracts, 3) generating a textual snippet which describes the most important information given in the abstract regarding the captured age and 4) mapping the text snippets to phenotypes from the DO and a subset of the UMLS. Instances for which an age-phenotype relationship could not be determined or no phenotype were found were not used to populate the database.
Figure 2
Figure 2
Database contents summary. A. Instances per relationship type. B. Instances per Age Ontology age class. An instance is assigned the most specific Age Ontology class which contains the whole of the age range to which that instance is linked. The age ranges of each class are as follows: Infant new born: 0-1 month; Infant: 0-2 years; Child: 2-12 years; Preschool child: 2-6 years; Adolescent: 12-18 years; Adult: 18-120 years; Young adult: 18-24 years; Middle aged: 45-64 years; Aged: 64-120 years; 80 years or over: 80-120.
Figure 3
Figure 3
Hierarchical clustering analysis results: Two examples. A) A phylogram of cluster no. 1, in which many sexually-transmitted diseases (STDs) are clustered. B) A phylogram of cluster no. 2, in which several types of cancer and cardio-vascular diseases are clustered. C) Graphical representation of clusters 1 and 2. For each cluster, the average number of instances per age per disease was plotted. For cluster 1, literature reports peak in the late teens - early 20s, while for cluster 2, literature reports peak at around age 60.
Figure 4
Figure 4
The APK data browser. A) The APK data browser search form. In this example, the selected search type is by age and phenotype, the 'Adolescent' age class is selected and the phenotype used for this query is 'Leukemia'. B) Results for the search for evidence linked to the age class 'Adolescent' and the disease 'Leukemia'. Results are displayed according to the type of the assigned age-phenotype relationship.

References

    1. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001. pp. 17–21. - PMC - PubMed
    1. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–D270. - PMC - PubMed
    1. Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004;5:147. doi: 10.1186/1471-2105-5-147. - DOI - PMC - PubMed
    1. Diamond SG, Markham CH, Hoehn MM, McDowell FH, Muenter MD. Effect of age at onset on progression and mortality in Parkinson's disease. Neurology. 1989;39:1187. - PubMed
    1. Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW. PreBIND and Textomy-mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics. 2003;4:11. doi: 10.1186/1471-2105-4-11. - DOI - PMC - PubMed

LinkOut - more resources