Pfam 10 years on: 10,000 families and still growing
- PMID: 18344544
- DOI: 10.1093/bib/bbn010
Pfam 10 years on: 10,000 families and still growing
Abstract
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Similar articles
-
Identifying protein domains with the Pfam database.Curr Protoc Bioinformatics. 2003 May;Chapter 2:Unit 2.5. doi: 10.1002/0471250953.bi0205s01. Curr Protoc Bioinformatics. 2003. PMID: 18428696
-
The Pfam protein families database: towards a more sustainable future.Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15. Nucleic Acids Res. 2016. PMID: 26673716 Free PMC article.
-
Pfam: the protein families database.Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27. Nucleic Acids Res. 2014. PMID: 24288371 Free PMC article.
-
The evolution of structural databases.Trends Biotechnol. 2002 Dec;20(12):498-501. doi: 10.1016/s0167-7799(02)02082-6. Trends Biotechnol. 2002. PMID: 12443870 Review.
-
Automatic annotation of protein function.Curr Opin Struct Biol. 2005 Jun;15(3):267-74. doi: 10.1016/j.sbi.2005.05.010. Curr Opin Struct Biol. 2005. PMID: 15922590 Review.
Cited by
-
Metagenomic Analysis of Zinc Surface-Associated Marine Biofilms.Microb Ecol. 2019 Feb;77(2):406-416. doi: 10.1007/s00248-018-01313-3. Epub 2019 Jan 5. Microb Ecol. 2019. PMID: 30612183
-
The Pfam protein families database in 2019.Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995. Nucleic Acids Res. 2019. PMID: 30357350 Free PMC article.
-
In silico analysis of transcription factor repertoire and prediction of stress responsive transcription factors in soybean.DNA Res. 2009 Dec;16(6):353-69. doi: 10.1093/dnares/dsp023. Epub 2009 Nov 2. DNA Res. 2009. PMID: 19884168 Free PMC article.
-
Identification of Antifungal Targets Based on Computer Modeling.J Fungi (Basel). 2018 Jul 4;4(3):81. doi: 10.3390/jof4030081. J Fungi (Basel). 2018. PMID: 29973534 Free PMC article. Review.
-
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.EURASIP J Bioinform Syst Biol. 2012 Jul 13;2012(1):8. doi: 10.1186/1687-4153-2012-8. EURASIP J Bioinform Syst Biol. 2012. PMID: 22793672 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources