Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb 9;9(2):e13368.
doi: 10.1016/j.heliyon.2023.e13368. eCollection 2023 Feb.

Framing Apache Spark in life sciences

Affiliations
Review

Framing Apache Spark in life sciences

Andrea Manconi et al. Heliyon. .

Abstract

Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities.

Keywords: 00-01; 99-00; Apache Spark; Big data; HPC; Parallel computing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Snapshot of a Spark application.
Figure 2
Figure 2
Spark Ecosystem.

Similar articles

Cited by

References

    1. https://www.embl.org/files/wp-content/uploads/EMBL-EBI_Annual-Report_202... Embl-ebi annual report 2020. URL.
    1. Unravelling the Human Genome–Phenome Relationship Using Phenome-Wide Association StudiesNat. Rev. Genet. 2016;17:129–145. - PubMed
    1. Atasoy H., Greenwood B.N., McCullough J.S. 2019. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Tech. Rep. - PubMed
    1. Zhou L., Pan S., Wang J., Vasilakos A. Machine learning on big data: opportunities and challenges. Neurocomputing. 2017;237:350–361. doi: 10.1016/j.neucom.2017.01.026. - DOI
    1. Parliament, Eurpean Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) Off. J. Eur. Union. 2016;119(1)

LinkOut - more resources