Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Oct 1;25(19):2455-65.
doi: 10.1093/bioinformatics/btp452. Epub 2009 Jul 30.

Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Affiliations
Review

Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Carsten Kemena et al. Bioinformatics. .

Abstract

This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Generic overview for the derivation of a consistency-based scoring scheme. The sequences are originally compared two by two using any suitable methods. The second box shows the projection of pair-wise comparisons. These projections may equally come from multiple sequence alignments, pair-wise comparison or any method able to generate such projections, including posterior decoding of an HMM. They may also come from a template-based comparison such as the one described in Figure 2. Pairs thus identified are incorporated in the primary library. These pairs are then associated with weights used during the extension. The figure shows the T-Coffee extension protocol. When using probabilistic consistency, the probabilities are treated as weights and triplet extension is made by multiplying the weights rather than taking the minimum. See Supplementary Material for color version of the figure.
Fig. 2.
Fig. 2.
Typical colored output of M-Coffee. This output was obtained on the RV11033 BaliBase dataset, made of 11 distantly related bacterial NADH dehydrogenases. The alignment was obtained by combining Muscle, T-Coffee, Kalign and Mafft with M-Coffee. Correctly aligned residues (correctly aligned with 50% of their column, as judged from the reference) are in upper case, non-correct ones are in lower case. In this colored output, each residue has a color that indicates the agreement of the four initial MSAs with respect to the alignment of that specific residue. Dark red indicates residues aligned in a similar fashion among all the individual MSAs, blue indicates a very low agreement. Dark yellow, orange and red residues can be considered to be reliably aligned. See Supplementary Material for color version of the figure.
Fig. 3.
Fig. 3.
Overview of template-based protocols. Templates are identified and mapped onto the target sequences. The figure shows three possible types of templates: homology extension, structure and functional annotation. The templates are then compared with a suitable method (profile aligner, structural aligner, etc.) and the resulting alignment (or comparison) is mapped onto the final alignment of the original target sequences. The residue pairs thus identified are then incorporated in the primary library. See Supplementary Material for color version of the figure.

References

    1. Abhiman S, et al. Prediction of function divergence in protein families using the substitution rate variation parameter alpha. Mol. Biol. Evol. 2006;23:1406–1413. - PubMed
    1. Armougom F, et al. The iRMSD: a local measure of sequence alignment accuracy using structural information. Bioinformatics. 2006a;22:e35–e39. - PubMed
    1. Armougom F, et al. Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 2006b;34:W604–W608. - PMC - PubMed
    1. Battey JN, et al. Automated server predictions in CASP7. Proteins. 2007;69(Suppl. 8):68–82. - PubMed
    1. Bauer M, et al. Multiple structural RNA alignment with Lagrangian relaxation. Lect. Notes Comput. Sci. 2005:303–314.