Scalable Consistency in T-Coffee Through Apache Spark and Cassandra Database

Jordi Lladós¹, Fernando Cores¹, Fernando Guirado¹

Affiliations

PMID: 30004242
DOI: 10.1089/cmb.2018.0084

Scalable Consistency in T-Coffee Through Apache Spark and Cassandra Database

Jordi Lladós et al. J Comput Biol. 2018 Aug.

. 2018 Aug;25(8):894-906.

doi: 10.1089/cmb.2018.0084. Epub 2018 Jul 13.

Authors

Jordi Lladós¹, Fernando Cores¹, Fernando Guirado¹

Affiliation

¹ INSPIRES Research Center, Universitat de Lleida , Lleida, Spain .

PMID: 30004242
DOI: 10.1089/cmb.2018.0084

Abstract

Next-generation sequencing, also known as high-throughput sequencing, has increased the volume of genetic data processed by sequencers. In the bioinformatic scientific area, highly rated multiple sequence alignment tools, such as MAFFT, ProbCons, and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage to improve the final accuracy. However, such methods are severely limited by the memory required to store the consistency information. Big data processing and persistence techniques are used to manage and store the huge amount of information that is generated. Although these techniques have significant advantages, few biological applications have adopted them. In this article, a novel approach named big data tree-based consistency objective function for alignment evaluation (BDT-Coffee) is presented. BDT-Coffee is based on the integration of consistency information through Cassandra database in TC, previously generated by the MapReduce processing paradigm, to enable large data sets to be processed with the aim of improving the performance and scalability of the original algorithm.

Keywords: Cassandra; Hadoop; MSA; Spark; T-Coffee; large-scale alignments.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scalable Consistency in T-Coffee Through Apache Spark and Cassandra Database

Affiliation

Scalable Consistency in T-Coffee Through Apache Spark and Cassandra Database

Authors

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources