SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
- PMID: 19906725
- PMCID: PMC2808863
- DOI: 10.1093/nar/gkp949
SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
Abstract
The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
Figures
Similar articles
-
SIMAP--structuring the network of protein similarities.Nucleic Acids Res. 2008 Jan;36(Database issue):D289-92. doi: 10.1093/nar/gkm963. Epub 2007 Nov 23. Nucleic Acids Res. 2008. PMID: 18037617 Free PMC article.
-
SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage.Nucleic Acids Res. 2014 Jan;42(Database issue):D279-84. doi: 10.1093/nar/gkt970. Epub 2013 Oct 27. Nucleic Acids Res. 2014. PMID: 24165881 Free PMC article.
-
SIMAP--the similarity matrix of proteins.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii42-6. doi: 10.1093/bioinformatics/bti1107. Bioinformatics. 2005. PMID: 16204123
-
Artificial Intelligence Learns Protein Prediction.Cold Spring Harb Perspect Biol. 2024 Sep 3;16(9):a041458. doi: 10.1101/cshperspect.a041458. Cold Spring Harb Perspect Biol. 2024. PMID: 38858069 Review.
-
Propagation, detection and correction of errors using the sequence database network.Brief Bioinform. 2022 Nov 19;23(6):bbac416. doi: 10.1093/bib/bbac416. Brief Bioinform. 2022. PMID: 36266246 Free PMC article. Review.
Cited by
-
Genome sequencing of the plant pathogen Taphrina deformans, the causal agent of peach leaf curl.mBio. 2013 Apr 30;4(3):e00055-13. doi: 10.1128/mBio.00055-13. mBio. 2013. PMID: 23631913 Free PMC article.
-
ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.Nucleic Acids Res. 2015 Jul 1;43(W1):W160-8. doi: 10.1093/nar/gkv455. Epub 2015 May 8. Nucleic Acids Res. 2015. PMID: 25956654 Free PMC article.
-
Unity in variety--the pan-genome of the Chlamydiae.Mol Biol Evol. 2011 Dec;28(12):3253-70. doi: 10.1093/molbev/msr161. Epub 2011 Jun 20. Mol Biol Evol. 2011. PMID: 21690563 Free PMC article.
-
Identifying problematic drugs based on the characteristics of their targets.Front Pharmacol. 2015 Sep 1;6:186. doi: 10.3389/fphar.2015.00186. eCollection 2015. Front Pharmacol. 2015. PMID: 26388775 Free PMC article.
-
The Fusarium graminearum genome reveals more secondary metabolite gene clusters and hints of horizontal gene transfer.PLoS One. 2014 Oct 15;9(10):e110311. doi: 10.1371/journal.pone.0110311. eCollection 2014. PLoS One. 2014. PMID: 25333987 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials