Homology-extended sequence alignment

V A Simossis¹, J Kleinjung, J Heringa

Affiliations

PMID: 15699183
PMCID: PMC549400
DOI: 10.1093/nar/gki233

Homology-extended sequence alignment

V A Simossis et al. Nucleic Acids Res. 2005.

. 2005 Feb 7;33(3):816-24.

doi: 10.1093/nar/gki233. Print 2005.

Authors

V A Simossis¹, J Kleinjung, J Heringa

Affiliation

¹ Bioinformatics Section, Faculty of Sciences, Vrije Universiteit De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.

PMID: 15699183
PMCID: PMC549400
DOI: 10.1093/nar/gki233

Abstract

We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.

PubMed Disclaimer

Figures

**Figure 1**
The schematic representation of the PRALINE_PSI strategy. Each sequence is submitted as a PSI-BLAST query to a database of choice. The resulting local alignments are filtered for redundancy and if no hits are found or all hits are redundant, the search is re-run using a new E-value threshold 10 times less stringent. The final local alignments for each sequence are converted to a pre-profile and given to the PRALINE alignment algorithm.

**Figure 2**
Comparison of alignment methods on the 624 HOMSTRAD pairwise alignments (Q score). The difference (Δ) between the average scores of each tested alignment method and that of the PRALINE_BASIC method is taken at 5% intervals. The PRALINE_PREPRO values for the pairwise alignments are identical to those of PRALINE_BASIC and, therefore, they are not included. The PRALINE_PSI scores are for the incremental strategy starting with an E-value of 10⁻⁶.

**Figure 3**
Sequence alignments of the protein methyltransferase (HOMSTRAD family ‘SopU_methylase_N’). The numbers in parentheses represent the Q scores of each alignment. The bottom alignment (HOMSTRAD) is the reference alignment derived from structure super-positioning and shows the secondary structures (DSSP-derived). Both the contact-based and the single sequence-based methods show a shift in the matched secondary structure elements, which is entirely prevented by the use of the extended evolutionary information. Correctly aligned residue pairs are denoted by a ‘∧’ sign.

**Figure 4**
The effects of using E-value thresholds of increasing stringency in PRALINE_PSI on the 624 HOMSTRAD pairwise alignments. (A) The difference (Δ) between the average Q scores of PRALINE_PSI and the basic PRALINE method, for all cases (0–100% sequence identity) and separately, cases between 0 and 30%, 30 and 60% and 60 and 100% sequence identity. (B) The distributions of improved, equal and worsened cases compared with the basic PRALINE method for each E-value threshold. The ‘inc’ column is the PRALINE_PSI incremental strategy starting from a threshold of 10⁻⁶, and the ‘max’ column is PRALINE_PSI's theoretical upper limit for the tested threshold range.

**Figure 5**
Comparison of alignment methods on the 399 HOMSTRAD multiple alignments (CS score). The difference (Δ) between the average scores of each tested alignment method and that of the PRALINE_BASIC method is taken at 5% intervals. The PRALINE_PSI scores are for the incremental strategy starting with an E-value of 10⁻⁶.

See this image and copyright information in PMC

References

1. Simossis V.A., Kleinjung J., Heringa J. In: Current Protocols in Bioinformatics. Baxevanis A.D., editor. NY: John Wiley; 2003. pp. 3.7.1–3.7.25. - PubMed
1. Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. - PubMed
1. Feng D.F., Doolittle R.F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 1987;25:351–360. - PubMed
1. Henikoff S., Henikoff J.G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA. 1992;89:10915–10919. - PMC - PubMed
1. Barker W.C., Ketcham L.K., Dayhoff M.O. A comprehensive examination of protein sequences for evidence of internal gene duplication. J. Mol. Evol. 1978;10:265–281. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Homology-extended sequence alignment

Affiliation

Homology-extended sequence alignment

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources