Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 24;8 Suppl 5(Suppl 5):S9.
doi: 10.1186/1471-2105-8-S5-S9.

Automatic extraction of reliable regions from multiple sequence alignments

Affiliations

Automatic extraction of reliable regions from multiple sequence alignments

Timo Lassmann et al. BMC Bioinformatics. .

Abstract

Background: High quality multiple alignments are crucial in the transfer of annotation from one genome to another. Multiple alignment methods strive to achieve ever increasing levels of average accuracy on benchmark sets while the accuracy of individual alignments is often overlooked.

Results: We have previously developed a method to automatically assess the accuracy and overall difficulty of multiple alignments. This was achieved by a per-residue comparison between alternate alignments of the same sequences. Here we present a key extension to this method, an algorithm to extract similarly aligned regions from several alignments and merge them into a new consensus alignment.

Conclusion: We demonstrate that the fraction of correctly aligned residues within the resulting alignments is increased by 25-100 percent compared to the original input alignments, as only the most reliably aligned parts are considered.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A Mumsa alignment visualized by Kalignvu. A relaxed Mumsa alignment derived from a ClustalW, Poa, Kalign, Probcons and Dialign alignment of the Balibase 3.0 test case BB20007. The parameter f was chosen to be two, requiring that residues in the output alignment appear in at least two input alignments. Each residue is colored according to the average occurrence of the POARs it is involved in. Regions that appear in red are identically aligned in all 5 input alignments while green and blue regions are only aligned identically in fewer and fewer cases. It is clear that all alignment programs find conserved motifs in the sequences but disagree on how the residues in between should be aligned.
Figure 2
Figure 2
Running time of Mumsa in comparison to the alignment methods used as input. The running time in CPU seconds for Mumsa using three settings in comparison to the cumulative running time of the alignment programs used to generate the input alignments. The running times of Mumsa were multiplied by 100 to be visible in the plot. The sequence files were generated by ROSE [16] using an average sequence length of 500 residues and and average evolutionary distance of 250. It is clear that the running time of Mumsa is at least two orders of magnitude lower than that required by the alignment programs.

Similar articles

Cited by

References

    1. Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 2001;270:17–30. doi: 10.1016/S0378-1119(01)00461-9. - DOI - PubMed
    1. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15:330–340. doi: 10.1101/gr.2821705. http://www.genome.org/cgi/content/abstract/15/2/330 - DOI - PMC - PubMed
    1. Katoh K, Kuma Ki, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. http://nar.oxfordjournals.org/cgi/content/abstract/33/2/511 - DOI - PMC - PubMed
    1. Lassmann T, Sonnhammer E. Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005;6:298. doi: 10.1186/1471-2105-6-298. http://www.biomedcentral.com/1471-2105/6/298 - DOI - PMC - PubMed
    1. Wallace IM, O'Sullivan O, Higgins DG, Notredame C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucl Acids Res. 2006;34:1692–1699. doi: 10.1093/nar/gkl091. http://nar.oxfordjournals.org/cgi/content/abstract/34/6/1692 - DOI - PMC - PubMed

Publication types

LinkOut - more resources