Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jun;13(6):1612-26.
doi: 10.1110/ps.03601504.

Scoring profile-to-profile sequence alignments

Affiliations

Scoring profile-to-profile sequence alignments

Guoli Wang et al. Protein Sci. 2004 Jun.

Abstract

Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies from the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much higher search sensitivity and specificity; (3) position-specific weighting schemes in determining amino acid counts in columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Scheme for profile–profile alignments.
Figure 2.
Figure 2.
Comparison of seven scoring functions for profile–profile alignment. Choices at specific stages of the alignment process are listed at the top of the figure and described in Materials and Methods. (Upper left) QModeler scores; (upper right) QDeveloper scores; (lower left) QCombined scores; (lower right) search capability as measured by the number of true positives vs. false positives. The legend given in the lower left figure applies to all four plots.
Figure 3.
Figure 3.
Comparison of Full Opt Gap Param vs. Fitted Gap Param for three scoring functions. Choices at specific stages of the alignment process are listed at the top of the figure and described in Materials and Methods. (Upper left) QModeler scores; (upper right) QDeveloper scores; (lower left) QCombined scores; (lower right) search capability as measured by the number of true positives vs. false positives. The legend given in the lower left figure applies to all four plots.
Figure 4.
Figure 4.
Comparison of three weighting schemes and two sequence-choice schemes for the Log Odds Multin scoring function. Choices at specific stages of the alignment process are listed at the top of the figure and described in Materials and Methods. (Upper left) QModeler scores; (upper right) QDeveloper scores; (lower left) QCombined scores; (lower right) search capability as measured by the number of true positives vs. false positives. The legend given in the lower left figure applies to all four plots.
Figure 5.
Figure 5.
Effect of adding secondary structure substitution matrix to three scoring schemes. Choices at specific stages of the alignment process are listed at the top of the figure and described in Materials and Methods. (Upper left) QModeler scores; (upper right) QDeveloper scores; (lower left) QCombined scores; (lower right) search capability as measured by the number of true positives vs. false positives. The legend given in the lower left figure applies to all four plots.
Figure 6.
Figure 6.
Effect of combining protocol choices for all seven scoring functions. For the first three panels, “Combined” means taking the best scoring result of the seven scoring functions for each alignment pair. For the last panel, the scores of the seven functions were summed and used to sort the hits to form the true/false positive curve. Choices at specific stages of the alignment process are listed at the top of the figure and described in Materials and Methods. (Upper left) QModeler scores; (upper right) QDeveloper scores; (lower left) QCombined scores; (lower right) search capability as measured by the number of true positives vs. false positives. The legend given in the lower left figure applies to all four plots.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of database programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Bourne, P.E. 2003. CASP and CAFASP experiments and their findings. Methods Biochem. Anal. 44 501–507. - PubMed
    1. Canutescu, A.A. and Dunbrack Jr., R.L. 2003. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Sci. 12 963–972. - PMC - PubMed
    1. Canutescu, A.A., Shelenkov, A.A., and Dunbrack Jr., R.L. 2003. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 12 2001–2014. - PMC - PubMed
    1. Fischer, D., Elofsson, A., Rice, D., and Eisenberg, D. 1996. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pac. Symp. Biocomput. pp. 300–318. - PubMed

Publication types

MeSH terms

LinkOut - more resources