Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug 25:12:353.
doi: 10.1186/1471-2105-12-353.

Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

Affiliations

Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

Farhana Naznin et al. BMC Bioinformatics. .

Abstract

Background: Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships.

Results: In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0.

Conclusions: The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of VDGA.
Figure 2
Figure 2
Flowchart of Initial Generation.
Figure 3
Figure 3
(a) Guide tree; (b) 1 represents randomly selected sequence numbers from (a), and 0 represents unselected sequence numbers; (c) subtree made by the selected sequences; (d) subtree made by the remaining sequences of (a); and (e) new Guide tree made from (c) and (d).
Figure 4
Figure 4
Shuffling mechanism, sequence numbers 8 and 3 interchange their positions.
Figure 5
Figure 5
Single point crossover.
Figure 6
Figure 6
Multiple point crossovers.
Figure 7
Figure 7
Mutation.
Figure 8
Figure 8
Vertical Division Process.
Figure 9
Figure 9
Graphical presentation of generating New Generation.
Figure 10
Figure 10
Graphical presentations of the performance of the VDGA_Decomp_3 method w.r.to the Best and Average WSPM score per generation.
Figure 11
Figure 11
Graphical presentations of the experimental results on MSA-GA selected datasets.
Figure 12
Figure 12
Overall Performance of all methods in MSA-GA selected datasets.
Figure 13
Figure 13
Graphical presentations of the experimental results on reference 2 datasets.
Figure 14
Figure 14
Graphical presentations of the experimental results on reference 3 datasets.
Figure 15
Figure 15
Overall Performance of all methods in reference 2 datasets.
Figure 16
Figure 16
Overall performance of all methods in reference 3 datasets.

Similar articles

Cited by

References

    1. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. - DOI - PMC - PubMed
    1. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed
    1. Stoye J, Perrey SW, Dress AWM. Improving the divide-and-conquer approach to sum-of-pairs multiple sequence alignment. App Maths Letters. 1997;10:67–73.
    1. Bonizzoni P, Vedova GD. The complexity of multiple sequence alignment with SP-score that is a metric. Theor Comp Sci. 2001;259:63–79. doi: 10.1016/S0304-3975(99)00324-2. - DOI
    1. Feng DF, Dolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25:351–350. doi: 10.1007/BF02603120. - DOI - PubMed

LinkOut - more resources