Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 Jun;21(3):398-403.
doi: 10.1016/j.sbi.2011.03.010. Epub 2011 Apr 14.

Metagenomics and the protein universe

Affiliations
Review

Metagenomics and the protein universe

Adam Godzik. Curr Opin Struct Biol. 2011 Jun.

Abstract

Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.

PubMed Disclaimer

Figures

Box 1
Box 1
Growth of TrEMBL protein database from its inception in 1996 and for comparison, timing and size of the largest metagenomics datasets. The largest single deposit to protein sequence databases, Global Ocean Survey dataset, quadrupled the then known protein universe *. An earlier Sargasso Sea set doubled it *, while the more recent set from human gut microbiome sequencing increased it by 25% **.
Box 2
Box 2
Many proteins from the new environments are highly divergent members of known protein families, displaying high structural (and possibly functional) similarity despite highly divergent, sometimes beyond recognition, amino acid sequences. A. An uncharacterized protein identified in global ocean survey (PDB code:2pgc, gray) is highly similar (3 Å RMSD over 193 amino acids) to putative monooxygenase from Lactobacillus acidophilus (PDB code: 2f44) with sequence identity of the structural alignment of 11% seq id, i.e. close to the random level. Both structures were solved by the JCSG center as part of the coverageof the protein structural space project . B. An uncharacterized protein from Sulfurospirillum deleyianum, (PDB code 3nkg, cyan) part of the GEBA project shows strong structural similarity to carbohydrate binding module from Saccharophagus degradans (PDB code 2cdo, magenta) . The S. delevianum protein was solved by the MCSG center as part of the structural survey of novel organisms .

Similar articles

Cited by

References

    1. Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci U S A. 2002;99:10494–9. - PMC - PubMed
    1. Schloss PD, Handelsman J. Status of the microbial census. Microbiol Mol Biol Rev. 2004;68:686–91. - PMC - PubMed
    1. Tettelin H, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A. 2005;102:13950–5. - PMC - PubMed
    1. Mira A, Martin-Cuadrado AB, D’Auria G, Rodriguez-Valera F. The bacterial pan-genome:a new paradigm in microbiology. Int Microbiol. 2010;13:45–57. - PubMed
    1. Sogin ML, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci U S A. 2006;103:12115–20. - PMC - PubMed

Publication types

LinkOut - more resources