Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jan;14(1):109-15.
doi: 10.1101/gr.1586704.

A cross-genomic approach for systematic mapping of phenotypic traits to genes

Affiliations

A cross-genomic approach for systematic mapping of phenotypic traits to genes

Kam Jim et al. Genome Res. 2004 Jan.

Abstract

We present a computational method for de novo identification of gene function using only cross-organismal distribution of phenotypic traits. Our approach assumes that proteins necessary for a set of phenotypic traits are preferentially conserved among organisms that share those traits. This method combines organism-to-phenotype associations,along with phylogenetic profiles,to identify proteins that have high propensities for the query phenotype; it does not require the use of any functional annotations for any proteins. We first present the statistical foundations of this approach and then apply it to a range of phenotypes to assess how its performance depends on the frequency and specificity of the phenotype. Our analysis shows that statistically significant associations are possible as long as the phenotype is neither extremely rare nor extremely common; results on the flagella,pili, thermophily,and respiratory tract tropism phenotypes suggest that reliable associations can be inferred when the phenotype does not arise from many alternate mechanisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relationship between maximum propensity Φ and minimum estimated formula image as a function of the number of organisms exhibiting phenotype f, given that there are N = 86 total genomes and that we are testing X = 4000 genes.
Figure 2
Figure 2
Receiver Operating Characteristic (ROC) curves comparing our approach with the approaches of Levesque et al. (2003) and Pellegrini et al. (1999) on the same flagella data set. Each ROC curve for our approach is obtained by keeping all genes with propensity scores greater than a fixed cutoff and varying the P-value cutoffs. The ROC curve for the Levesque et al. (2003) approach is obtained by varying the similarity threshold cutoff. The ROC curve for the Pellegrini et al. (1999) approach is obtained by comparing phylogenetic profiles against the FlgL gene (used in their study) and varying the Manhattan distance cutoff.
Figure 3
Figure 3
Average phylogenetic distances between E. Coli proteins at each flagellar propensity level.
Figure 4
Figure 4
Hierarchical clustering (average-linkage) of the top proteins associated with flagella and thermophily (see Tables 2 and 4), on the basis of their phylogenetic profiles. Genomes are on the x-axis, and genes are on the y-axis. Gray coloring indicates the presence of a gene in a genome.

References

    1. Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Bao, Q., Tian, Y., Li, W., Xu, Z., Xuan, Z., Hu, S., Dong, W., Yang, J., Chen, Y., Xue, Y., et al. 2002. A complete sequence of the T. tengcongensis genome. Genome Res. 12: 689-700. - PMC - PubMed
    1. Borges, K.M., Brummet, S.R., Bogert, A., Davis, M.C., Hujer, K.M., Domke, S.T., Szasz, J., Ravell, J., DiRuggiero, J., Fuller, C., et al. 1996. A survey of the genome of the hyperthermophilic archaeon, pyrococcus furiosus. Genome Sci. Technol. 1: 37-46.
    1. Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M., and Yuan, Y. 1998. Predicting function: From genes to genomes and back. J. Mol. Biol. 283: 707-725. - PubMed
    1. Enright, A.J., Iliopoulos, I., and Kyrpides, N.C. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402: 86-90. - PubMed

WEB SITE REFERENCES

    1. http://www.ncbi.nlm.nih.gov/COG/; COGs database.
    1. http://www.ncbi.nih.gov/BLAST/blast_databases.html; NCBI non-redundant peptide sequence database.
    1. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed; PubMed.

Publication types

MeSH terms