Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 14;18(1):86.
doi: 10.1186/s12864-016-3477-5.

MARS: improving multiple circular sequence alignment using refined sequences

Affiliations

MARS: improving multiple circular sequence alignment using refined sequences

Lorraine A K Ayad et al. BMC Genomics. .

Abstract

Background: A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program.

Results: We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed.

Conclusions: Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present here a method, which can be used in conjunction with any multiple sequence alignment program, to address this problem effectively and efficiently.

Keywords: Circular sequences; Multiple circular sequence alignment; Progressive alignment; q-grams.

PubMed Disclaimer

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Biol. 1970;19(2):99–113. - PubMed
    1. Maes M. Polygonal shape recognition using string-matching techniques. Pattern Recogn. 1991;24(5):433–40. doi: 10.1016/0031-3203(91)90056-B. - DOI
    1. Cambouropoulos E, Crawford T, Iliopoulos CS. Pattern processing in melodic sequences: Challenges, caveats and prospects. Comput Hum. 2001;35(1):9–21. doi: 10.1023/A:1002646129893. - DOI
    1. Needleman SB, Wunsch CD. A general method applicable tothe search for similarities in the amino acid sequences of two proteins. J Mol Biol. 1970;48:443–53. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed
    1. Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982;162:705–8. doi: 10.1016/0022-2836(82)90398-9. - DOI - PubMed

Publication types

LinkOut - more resources