Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Feb 7;33(3):816-24.
doi: 10.1093/nar/gki233. Print 2005.

Homology-extended sequence alignment

Affiliations

Homology-extended sequence alignment

V A Simossis et al. Nucleic Acids Res. .

Abstract

We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The schematic representation of the PRALINEPSI strategy. Each sequence is submitted as a PSI-BLAST query to a database of choice. The resulting local alignments are filtered for redundancy and if no hits are found or all hits are redundant, the search is re-run using a new E-value threshold 10 times less stringent. The final local alignments for each sequence are converted to a pre-profile and given to the PRALINE alignment algorithm.
Figure 2
Figure 2
Comparison of alignment methods on the 624 HOMSTRAD pairwise alignments (Q score). The difference (Δ) between the average scores of each tested alignment method and that of the PRALINEBASIC method is taken at 5% intervals. The PRALINEPREPRO values for the pairwise alignments are identical to those of PRALINEBASIC and, therefore, they are not included. The PRALINEPSI scores are for the incremental strategy starting with an E-value of 10−6.
Figure 3
Figure 3
Sequence alignments of the protein methyltransferase (HOMSTRAD family ‘SopU_methylase_N’). The numbers in parentheses represent the Q scores of each alignment. The bottom alignment (HOMSTRAD) is the reference alignment derived from structure super-positioning and shows the secondary structures (DSSP-derived). Both the contact-based and the single sequence-based methods show a shift in the matched secondary structure elements, which is entirely prevented by the use of the extended evolutionary information. Correctly aligned residue pairs are denoted by a ‘∧’ sign.
Figure 4
Figure 4
The effects of using E-value thresholds of increasing stringency in PRALINEPSI on the 624 HOMSTRAD pairwise alignments. (A) The difference (Δ) between the average Q scores of PRALINEPSI and the basic PRALINE method, for all cases (0–100% sequence identity) and separately, cases between 0 and 30%, 30 and 60% and 60 and 100% sequence identity. (B) The distributions of improved, equal and worsened cases compared with the basic PRALINE method for each E-value threshold. The ‘inc’ column is the PRALINEPSI incremental strategy starting from a threshold of 10−6, and the ‘max’ column is PRALINEPSI's theoretical upper limit for the tested threshold range.
Figure 5
Figure 5
Comparison of alignment methods on the 399 HOMSTRAD multiple alignments (CS score). The difference (Δ) between the average scores of each tested alignment method and that of the PRALINEBASIC method is taken at 5% intervals. The PRALINEPSI scores are for the incremental strategy starting with an E-value of 10−6.

References

    1. Simossis V.A., Kleinjung J., Heringa J. In: Current Protocols in Bioinformatics. Baxevanis A.D., editor. NY: John Wiley; 2003. pp. 3.7.1–3.7.25. - PubMed
    1. Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. - PubMed
    1. Feng D.F., Doolittle R.F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 1987;25:351–360. - PubMed
    1. Henikoff S., Henikoff J.G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA. 1992;89:10915–10919. - PMC - PubMed
    1. Barker W.C., Ketcham L.K., Dayhoff M.O. A comprehensive examination of protein sequences for evidence of internal gene duplication. J. Mol. Evol. 1978;10:265–281. - PubMed

Publication types