Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 7;10 Suppl 1(Suppl 1):S6.
doi: 10.1186/1471-2164-10-S1-S6.

Annotating the human genome with Disease Ontology

Affiliations

Annotating the human genome with Disease Ontology

John D Osborne et al. BMC Genomics. .

Abstract

Background: The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.

Results: We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.

Conclusion: The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of Disease Ontology annotation of the human genome. A) MMTx was used to annotate GeneRIFs with the Disease Ontology (DO). B) An example GeneRIF suggests that Gene ID: 7040 is annotated with DOID:2585.
Figure 2
Figure 2
Example Gene Annotation by DO, OMIM and GO. ATP7B ATPase, Cu++ transporting, beta polypeptide. GeneID: 540. This gene is a member of the P-type cation transport ATPase family and encodes a protein with several membrane-spanning domains, an ATPase consensus sequence, a hinge domain, a phosphorylation site, and at least 2 putative copper-binding sites. This protein functions as a monomer, exporting copper out of the cells, such as the efflux of hepatic copper into the bile. Alternate transcriptional splice variants, encoding different isoforms with distinct cellular localizations, have been characterized. Mutations in this gene have been associated with Wilson disease (WD). DOID. Breast Carcinoma, Carcinoma, Congenital Abnormality, Disorder of copper metabolism, Esophageal carcinoma, Hepatolenticular Degeneration, Liver diseases, Malignant neoplasm of ovary, Primary carcinoma of the liver cells, Stomach Carcinoma. OMIM. Wilson disease. GO. ATP binding, ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism, Component, Golgi apparatus, Process, cellular copper ion homeostasis, cellular zinc ion homeostasis, colocalizes_with basolateral plasma membrane, colocalizes_with cytoplasmic membrane-bounded vesicle, colocalizes_with perinuclear region of cytoplasm, colocalizes_with trans-Golgi network, copper ion binding, copper ion import, copper ion transmembrane transporter activity, copper ion transport, copper-exporting ATPase activity, cytoplasm, hydrolase activity, hydrolase activity, acting on acid anhydrides, catalyzing transmembrane movement of substances, integral to membrane, integral to plasma membrane, intracellular copper ion transport, ion transport, lactation, late endosome, magnesium ion binding, membrane, membrane fraction, metabolic process, metal ion binding, metal ion transmembrane transporter activity, metal ion transport, mitochondrion, nucleotide binding, protein binding, response to copper ion, sequestering of calcium ion, transport. An example gene annotation is provided for ATP7B. The gene description, DOID, OMIM, and GO annotation descriptions are provided.
Figure 3
Figure 3
Comparison of DO and OMIM Annotation. A) The number of diseases per gene is plotted for the Disease Ontology (DO) analysis and OMIM. B) The number of genes per disease is plotted for the Disease Ontology (DO) analysis and OMIM.
Figure 4
Figure 4
Genes linked to different types of cancers. Ovarian, breast cancer, neuroblastoma and multiple myeloma are represented by large grey dots. Genes annotated to each of these diseases are represented by smaller grey dots with 357 genes annotated to ovarian, 199 genes annotated to breast cancer, 156 genes annotated to neuroblastoma, and 135 genes annotated to multiple myeloma. The 11 genes (MMP2, MYC, BCL2, KIT, WT1, CXCL12, CDKN1B, IGF1, CCND1, BIRC5 and SKP2) related to all four diseases are highlighted in the shaded circle at the center.

References

    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. - DOI - PMC - PubMed
    1. Dennis G, Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3. doi: 10.1186/gb-2003-4-5-p3. - DOI - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Becker KG, Barnes KC, Bright TJ, Wang SA. The genetic association database. Nat Genet. 2004;36:431–432. doi: 10.1038/ng0504-431. - DOI - PubMed
    1. Masseroli M, Galati O, Manzotti M, Gibert K, Pinciroli F. Inherited disorder phenotypes: controlled annotation and statistical analysis for knowledge mining from gene lists. BMC Bioinformatics. 2005;6:S18. doi: 10.1186/1471-2105-6-S4-S18. - DOI - PMC - PubMed

LinkOut - more resources