Towards completion of the Earth's proteome
- PMID: 18059312
- PMCID: PMC2267224
- DOI: 10.1038/sj.embor.7401117
Towards completion of the Earth's proteome
Abstract
New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome--the complete set of proteins on Earth--is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.
Figures






Similar articles
-
Toward completion of the Earth's proteome: an update a decade later.Brief Bioinform. 2019 Mar 22;20(2):463-470. doi: 10.1093/bib/bbx127. Brief Bioinform. 2019. PMID: 29040399
-
FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.J Comput Biol. 2016 Apr;23(4):270-8. doi: 10.1089/cmb.2015.0191. Epub 2016 Feb 1. J Comput Biol. 2016. PMID: 26828375
-
A draft map of rhesus monkey tissue proteome for biomedical research.PLoS One. 2015 May 14;10(5):e0126243. doi: 10.1371/journal.pone.0126243. eCollection 2015. PLoS One. 2015. PMID: 25974132 Free PMC article.
-
In silico characterization of proteins: UniProt, InterPro and Integr8.Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4. Mol Biotechnol. 2008. PMID: 18219596 Review.
-
Mass spectrometric identification of proteins and characterization of their post-translational modifications in proteome analysis.Fresenius J Anal Chem. 2000 Mar-Apr;366(6-7):677-90. doi: 10.1007/s002160051562. Fresenius J Anal Chem. 2000. PMID: 11225779 Review.
Cited by
-
Preimplantation development regulatory pathway construction through a text-mining approach.BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2164-12-S4-S3. Epub 2011 Dec 22. BMC Genomics. 2011. PMID: 22369103 Free PMC article.
-
Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.RNA Biol. 2011 Nov-Dec;8(6):988-96. doi: 10.4161/rna.8.6.17813. Epub 2011 Nov 1. RNA Biol. 2011. PMID: 21955494 Free PMC article.
-
Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome.Proteins. 2014 Apr;82(4):640-7. doi: 10.1002/prot.24441. Epub 2013 Nov 22. Proteins. 2014. PMID: 24123256 Free PMC article.
-
Génie: literature-based gene prioritization at multi genomic scale.Nucleic Acids Res. 2011 Jul;39(Web Server issue):W455-61. doi: 10.1093/nar/gkr246. Epub 2011 May 23. Nucleic Acids Res. 2011. PMID: 21609954 Free PMC article.
-
Minireview: applied structural bioinformatics in proteomics.Protein J. 2013 Oct;32(7):505-11. doi: 10.1007/s10930-013-9514-1. Protein J. 2013. PMID: 24096348 Review.
References
-
- Adam GC, Sorensen EJ, Cravatt BF (2002) Chemical strategies for functional proteomics. Mol Cell Proteomics 1: 781–790 - PubMed
-
- Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelley JM, Utterback TR, Nagle JW, Fields C, Venter JC (1992) Sequence identification of 2,375 human brain genes. Nature 355: 632–634 - PubMed
-
- Casari G, Andrade MA, Bork P, Boyle J, Daruvar A, Ouzounis C, Schneider R, Tamames J, Valencia A, Sander C (1995) Challenging times for bioinformatics. Nature 376: 647–648 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials