Scoring profile-to-profile sequence alignments
- PMID: 15152092
- PMCID: PMC2279992
- DOI: 10.1110/ps.03601504
Scoring profile-to-profile sequence alignments
Abstract
Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies from the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much higher search sensitivity and specificity; (3) position-specific weighting schemes in determining amino acid counts in columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.
Figures






Similar articles
-
Comparison of linear gap penalties and profile-based variable gap penalties in profile-profile alignments.Comput Biol Chem. 2011 Oct 12;35(5):308-18. doi: 10.1016/j.compbiolchem.2011.07.006. Epub 2011 Jul 22. Comput Biol Chem. 2011. PMID: 22000802
-
A comparison of scoring functions for protein sequence profile alignment.Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12. Bioinformatics. 2004. PMID: 14962936
-
Incremental window-based protein sequence alignment algorithms.Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297. Bioinformatics. 2007. PMID: 17237087
-
Sensitive methods for determining the relatedness of proteins with limited sequence homology.Curr Opin Biotechnol. 1994 Aug;5(4):361-71. doi: 10.1016/0958-1669(94)90044-2. Curr Opin Biotechnol. 1994. PMID: 7765168 Review.
-
Sequence and structure alignments in post-AlphaFold era.Curr Opin Struct Biol. 2023 Apr;79:102539. doi: 10.1016/j.sbi.2023.102539. Epub 2023 Feb 6. Curr Opin Struct Biol. 2023. PMID: 36753924 Review.
Cited by
-
Relative packing groups in template-based structure prediction: cooperative effects of true positive constraints.J Comput Biol. 2011 Jan;18(1):17-26. doi: 10.1089/cmb.2010.0078. J Comput Biol. 2011. PMID: 21210729 Free PMC article.
-
Cluster Analysis of p53 Binding Site Sequences Reveals Subsets with Different Functions.Cancer Inform. 2016 Oct 25;15:199-209. doi: 10.4137/CIN.S39968. eCollection 2016. Cancer Inform. 2016. PMID: 27812278 Free PMC article.
-
A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction.Sci Rep. 2013;3:2619. doi: 10.1038/srep02619. Sci Rep. 2013. PMID: 24018415 Free PMC article.
-
PPalign: optimal alignment of Potts models representing proteins with direct coupling information.BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4. BMC Bioinformatics. 2021. PMID: 34112081 Free PMC article.
-
Statistical limits to the identification of ion channel domains by sequence similarity.J Gen Physiol. 2006 Jun;127(6):755-66. doi: 10.1085/jgp.200509419. J Gen Physiol. 2006. PMID: 16735758 Free PMC article.
References
-
- Bourne, P.E. 2003. CASP and CAFASP experiments and their findings. Methods Biochem. Anal. 44 501–507. - PubMed
-
- Fischer, D., Elofsson, A., Rice, D., and Eisenberg, D. 1996. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pac. Symp. Biocomput. pp. 300–318. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous