Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jan 4:2023.01.04.522777.
doi: 10.1101/2023.01.04.522777.

ElasticBLAST: Accelerating Sequence Search via Cloud Computing

Affiliations

ElasticBLAST: Accelerating Sequence Search via Cloud Computing

Christiam Camacho et al. bioRxiv. .

Update in

Abstract

Background: Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform.

Results: We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information.

Conclusion: We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors declare they have no competing interests.

Figures

Figure 1:
Figure 1:
High level ElasticBLAST schematic
Figure 2:
Figure 2:
Dataflow in ElasticBLAST
Figure 3:
Figure 3:
Architecture and workflow overview on AWS
Figure 4:
Figure 4:
Architecture and workflow overview on GCP
Figure 5:
Figure 5:
A configuration file used in the second example (below). This configuration file is for GCP. The use-preemptible keyword in the cluster section specifies the use of discounted instances. Information relevant to the search is in the blast section. Results are placed in the user’s bucket specified by the results keyword in the blast section.
Figure 6:
Figure 6:
Percent of RNA-Seq reads assigned to each taxonomy species for eight Physalis peruviana samples. a) Taxonomy tree created from the alignment to GTax Eudicotyledons taxonomy group. Percent of reads at species level with respect to total reads in all samples. b) percent of reads not identified in the first alignment that match other GTax taxonomy groups. Percent of reads in the Pie chart are related to the total contaminant reads.
Figure 7:
Figure 7:
Cluster size (top) and CPU utilization of the cluster (bottom) for an ElasticBLAST run with four instances. This is a screenshot of the GCP monitoring view for the cluster. The cluster has only one instance from 2:15–2:40 (top graph), allowing for the installation of software and databases. The bottom graph shows that the cluster has about 50% CPU utilization after 4:20, and the top graph shows the cluster size shrinking about 10 minutes later. The CPU utilization at a given time is based on the size of the cluster at that point in time.

References

    1. Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinformatics 2009, 10:421. - PMC - PubMed
    1. Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I: GenBank. Nucleic Acids Res 2022, 50(D1):D161–D164. - PMC - PubMed
    1. Langmead B, Nellore A: Cloud computing for genomic data analysis and collaboration. Nat Rev Genet 2018, 19(4):208–219 - PMC - PubMed
    1. BLAST Databases [https://github.com/ncbi/blast_plus_docs/#blast-databases] Accessed 16 November 2022.

Publication types