Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Carsten Kemena¹, Cedric Notredame

Affiliations

PMID: 19648142
PMCID: PMC2752613
DOI: 10.1093/bioinformatics/btp452

Review

Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Carsten Kemena et al. Bioinformatics. 2009.

. 2009 Oct 1;25(19):2455-65.

doi: 10.1093/bioinformatics/btp452. Epub 2009 Jul 30.

Authors

Carsten Kemena¹, Cedric Notredame

Affiliation

¹ Centre For Genomic Regulation, Pompeus Fabre University, Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain.

PMID: 19648142
PMCID: PMC2752613
DOI: 10.1093/bioinformatics/btp452

Abstract

This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches.

PubMed Disclaimer

Figures

**Fig. 1.**
Generic overview for the derivation of a consistency-based scoring scheme. The sequences are originally compared two by two using any suitable methods. The second box shows the projection of pair-wise comparisons. These projections may equally come from multiple sequence alignments, pair-wise comparison or any method able to generate such projections, including posterior decoding of an HMM. They may also come from a template-based comparison such as the one described in Figure 2. Pairs thus identified are incorporated in the primary library. These pairs are then associated with weights used during the extension. The figure shows the T-Coffee extension protocol. When using probabilistic consistency, the probabilities are treated as weights and triplet extension is made by multiplying the weights rather than taking the minimum. See Supplementary Material for color version of the figure.

**Fig. 2.**
Typical colored output of M-Coffee. This output was obtained on the RV11033 BaliBase dataset, made of 11 distantly related bacterial NADH dehydrogenases. The alignment was obtained by combining Muscle, T-Coffee, Kalign and Mafft with M-Coffee. Correctly aligned residues (correctly aligned with 50% of their column, as judged from the reference) are in upper case, non-correct ones are in lower case. In this colored output, each residue has a color that indicates the agreement of the four initial MSAs with respect to the alignment of that specific residue. Dark red indicates residues aligned in a similar fashion among all the individual MSAs, blue indicates a very low agreement. Dark yellow, orange and red residues can be considered to be reliably aligned. See Supplementary Material for color version of the figure.

**Fig. 3.**
Overview of template-based protocols. Templates are identified and mapped onto the target sequences. The figure shows three possible types of templates: homology extension, structure and functional annotation. The templates are then compared with a suitable method (profile aligner, structural aligner, etc.) and the resulting alignment (or comparison) is mapped onto the final alignment of the original target sequences. The residue pairs thus identified are then incorporated in the primary library. See Supplementary Material for color version of the figure.

See this image and copyright information in PMC

References

1. Abhiman S, et al. Prediction of function divergence in protein families using the substitution rate variation parameter alpha. Mol. Biol. Evol. 2006;23:1406–1413. - PubMed
1. Armougom F, et al. The iRMSD: a local measure of sequence alignment accuracy using structural information. Bioinformatics. 2006a;22:e35–e39. - PubMed
1. Armougom F, et al. Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 2006b;34:W604–W608. - PMC - PubMed
1. Battey JN, et al. Automated server predictions in CASP7. Proteins. 2007;69(Suppl. 8):68–82. - PubMed
1. Bauer M, et al. Multiple structural RNA alignment with Lagrangian relaxation. Lect. Notes Comput. Sci. 2005:303–314.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Affiliation

Upcoming challenges for multiple sequence alignment methods in the high-throughput era

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources