Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 28;2(3):e324.
doi: 10.1371/journal.pone.0000324.

Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes

Affiliations

Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes

Gareth A Wilson et al. PLoS One. .

Abstract

Background: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein.

Methodology/principal findings: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores.

Conclusions: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distributions of orphans and non-orphans in E.coli K12.
The predicted proteins in E.coli K12 that were found to be unique (light gray) when compared to 122 bacterial proteomes (shown in Table S1) were designated as orphans (n = 174). All remaining proteins (dark gray) were non-orphans (n = 4137). Distributions of values for both groups were calculated as a percentage for (a) length, (b) percent low complexity, (c) G+C difference from the mean, (d) Cost and (e) Neighbourhood Distribution.
Figure 2
Figure 2. QIPP and Criterion Distributions of orphans in 122 bacterial genomes.
The orphans (n = 43513) obtained from 122 bacterial genomes were scored and the distribution plotted according to (a) QIPP and the individual criteria that constitute QIPP: (b) length, (c) percent low complexity, (d) G+C difference from the mean, (e) cost and (f) Neighbourhood Distribution.
Figure 3
Figure 3. Genomes which are more taxonomically isolated have larger numbers of high-scoring orphan predicted proteins.
Chi-squared tests were used to determine which genomes had significantly more predicted proteins in the top 50% of the list of ranked orphan predicted proteins than would be expected by chance (−1 = significantly less orphans than expected in top 50% rank, 0 = no significant difference and 1 = significantly more orphans than expected in top 50% rank).
Figure 4
Figure 4. Calculated QIPP scores for 5 bacterial genomes split into taxonomic classes.
Every predicted protein in (a) E.coli K12, (b) H.pylori 26695, (c) N.meningitides MC58, (d) P.marinus CCMP1375 and (e) V.vulnificus CMCP6 was put into the taxonomic level at which it was restricted and scored according to QIPP.

References

    1. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. Epub 2003 Sep 2011. - PMC - PubMed
    1. Charlebois RL, Doolittle WF. Computing prokaryotic gene ubiquity: rescuing the core from extinction. Genome Res. 2004;14:2469–2477. - PMC - PubMed
    1. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, et al. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311:1283–1287. - PubMed
    1. Wilson GA, Bertrand N, Patel Y, Hughes JB, Feil EJ, et al. Orphans as taxonomically restricted and ecologically important genes. Microbiology. 2005;151:2499–2501. - PubMed
    1. Daubin V, Ochman H. Bacterial Genomes as new gene homes: The genealogy of ORFans in E-coli. Genome Research. 2004;14:1036–1042. - PMC - PubMed

Publication types

MeSH terms