CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Xi Chen¹, Chen Wang¹, Shanjiang Tang¹, Ce Yu², Quan Zou¹

Affiliations

¹ School of Computer Science and Technology, Tianjin University, Yaguan Road, Tianjin, China.
² School of Computer Science and Technology, Tianjin University, Yaguan Road, Tianjin, China. yuce@tju.edu.cn.

PMID: 28646874
PMCID: PMC5483318
DOI: 10.1186/s12859-017-1725-6

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Xi Chen et al. BMC Bioinformatics. 2017.

. 2017 Jun 24;18(1):315.

doi: 10.1186/s12859-017-1725-6.

Authors

Xi Chen¹, Chen Wang¹, Shanjiang Tang¹, Ce Yu², Quan Zou¹

Affiliations

¹ School of Computer Science and Technology, Tianjin University, Yaguan Road, Tianjin, China.
² School of Computer Science and Technology, Tianjin University, Yaguan Road, Tianjin, China. yuce@tju.edu.cn.

PMID: 28646874
PMCID: PMC5483318
DOI: 10.1186/s12859-017-1725-6

Abstract

Background: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.

Results: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn ²) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software.

Conclusion: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .

Keywords: Center star alignment; GPU; Heterogeneous; Multiple sequence alignment (MSA).

PubMed Disclaimer

Figures

**Fig. 1**
The heterogeneous CPU/GPU architecture. To achieving the best performance, the co-run model of CPU and GPU is adopted

**Fig. 2**
The overall flow of CMSA. Multiple sequence alignment is handled on the heterogeneous CPU/GPU platform

**Fig. 3**
Experiments on datasets with different number of sequences. D1, D2, D3 represent three kinds of datasets described in Table 4. a Running time and b Speedup

See this image and copyright information in PMC

References

1. Karadimitriou K, Kraft DH. Genetic algorithms and the multiple sequence alignment problem in biology. In: Proceedings of the Second Annual Molecular Biology and Biotechnology Conference. Baton Rouge: 1996. p. 1–7.
1. Zou Q, Shan X, Jiang Y. A novel center star multiple sequence alignment algorithm based on affine gap penalty and k-band. Phys Procedia. 2012;33:322–7. doi: 10.1016/j.phpro.2012.05.069. - DOI
1. Wang J, Guo M, Liu X, Liu Y, Wang C, Xing L, Che K. Lnetwork: an efficient and effective method for constructing phylogenetic networks. Bioinformatics. 2013;29:378. - PubMed
1. Wang L, Jiang T. On the complexity of multiple sequence alignment. J Comput Biol. 1994;1(4):337–48. doi: 10.1089/cmb.1994.1.337. - DOI - PubMed
1. Lassmann T, Sonnhammer EL. Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinforma. 2005;6(1):1. doi: 10.1186/1471-2105-6-298. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Affiliations

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources