Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 12:7:257.
doi: 10.1186/1471-2164-7-257.

Integration of curated databases to identify genotype-phenotype associations

Affiliations

Integration of curated databases to identify genotype-phenotype associations

Chern-Sing Goh et al. BMC Genomics. .

Abstract

Background: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype) and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs).

Results: Integrating the information in the two databases, we are able to correlate the presence or absence of a given protein in a microbe with its phenotype as measured by certain morphological characteristics or survival in a particular growth media. With a 0.8 correlation score threshold, 66% of the associations found were confirmed by the literature and at a 0.9 correlation threshold, 86% were positively verified.

Conclusion: Our results suggest possible phenotypic manifestations for proteins biochemically associated with sugar metabolism and electron transport. Moreover, we believe our approach can be extended to linking pathogenic phenotypes with functionally related proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of correlation analysis for associating COGs to lab condition phenotypes. The correlation analysis measures the association between a COG's organism profile (presence or absence of an organism) and a lab condition's organism survival profile. Organisms that have a COG (red) are mapped to the organism's response to adverse growth conditions (blue) creating two vectors that are used for the correlation calculation.
Figure 2
Figure 2
Number of COG-phenotype associated pairs in each subset of the 0.8 and 0.9 threshold correlation score data sets. The resulting data sets of the (a) 0.8 correlation threshold and the (b) 0.9 correlation threshold are broken down into four different subsets. Total number (dark blue) is the total number of COG-phenotype associated pairs found at the 0.8 and 0.9 thresholds respectively. Characterized (light purple) refers to those pairs where the COG has a known function. Annotated (blue-green) are those pairs which were selected for literature verification. Finally, confirmed (light blue) are the associations which were validated in the literature. This is shown for each lab indicated by its GIDEON identifier.

References

    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucl Acids Res. 2005;33:D34–38. - PMC - PubMed
    1. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–753. doi: 10.1126/science.285.5428.751. - DOI - PubMed
    1. Makarova KS, Wolf YI, Koonin EV. Potential genomic determinants of hyperthermophily. Trends Genet. 2003;19:172–176. doi: 10.1016/S0168-9525(03)00047-7. - DOI - PubMed
    1. Jim K, Parmar K, Singh M, Tavazoie S. A cross-genomic approach for systematic mapping of phenotypic traits to genes. Genome Res. 2004;14:109–115. doi: 10.1101/gr.1586704. - DOI - PMC - PubMed
    1. Levesque M, Shasha D, Kim W, Surette MG, Benfey PN. Trait-to-gene: a computational method for predicting the function of uncharacterized genes. Curr Biol. 2003;13:129–133. doi: 10.1016/S0960-9822(03)00009-5. - DOI - PubMed

Publication types

LinkOut - more resources