The rapid generation of mutation data matrices from protein sequences
- PMID: 1633570
- DOI: 10.1093/bioinformatics/8.3.275
The rapid generation of mutation data matrices from protein sequences
Abstract
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.
Similar articles
-
PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.BMC Res Notes. 2015 May 7;8:187. doi: 10.1186/s13104-015-1152-6. BMC Res Notes. 2015. PMID: 25947299 Free PMC article.
-
A bank of protein family patterns for rapid identification of possible functions of amino acid sequences.Comput Appl Biosci. 1997 Apr;13(2):115-22. doi: 10.1093/bioinformatics/13.2.115. Comput Appl Biosci. 1997. PMID: 9146957
-
A new approach for displaying identities and differences among aligned amino acid sequences.Comput Appl Biosci. 1992 Jun;8(3):261-5. doi: 10.1093/bioinformatics/8.3.261. Comput Appl Biosci. 1992. PMID: 1633568
-
A set-theoretic approach to database searching and clustering.Bioinformatics. 1998 Jun;14(5):430-8. doi: 10.1093/bioinformatics/14.5.430. Bioinformatics. 1998. PMID: 9682056
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
Cited by
-
Bacillus pumilus Group Comparative Genomics: Toward Pangenome Features, Diversity, and Marine Environmental Adaptation.Front Microbiol. 2021 May 7;12:571212. doi: 10.3389/fmicb.2021.571212. eCollection 2021. Front Microbiol. 2021. PMID: 34025591 Free PMC article.
-
In silico analysis of bacterial translation factors reveal distinct translation event specific pI values.BMC Genomics. 2021 Mar 29;22(1):220. doi: 10.1186/s12864-021-07472-x. BMC Genomics. 2021. PMID: 33781198 Free PMC article.
-
The maize fused leaves1 (fdl1) gene controls organ separation in the embryo and seedling shoot and promotes coleoptile opening.J Exp Bot. 2015 Sep;66(19):5753-67. doi: 10.1093/jxb/erv278. Epub 2015 Jun 20. J Exp Bot. 2015. PMID: 26093144 Free PMC article.
-
Genome-wide analysis of the carotenoid cleavage dioxygenases gene family in Forsythia suspensa: Expression profile and cold and drought stress responses.Front Plant Sci. 2022 Sep 20;13:998911. doi: 10.3389/fpls.2022.998911. eCollection 2022. Front Plant Sci. 2022. PMID: 36204048 Free PMC article.
-
Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites.J Mol Evol. 2022 Dec;90(6):468-475. doi: 10.1007/s00239-022-10076-y. Epub 2022 Oct 7. J Mol Evol. 2022. PMID: 36207534 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources
Miscellaneous