Review

. 2023 Feb 9;9(2):e13368.

doi: 10.1016/j.heliyon.2023.e13368. eCollection 2023 Feb.

Framing Apache Spark in life sciences

Andrea Manconi¹, Matteo Gnocchi¹, Luciano Milanesi¹, Osvaldo Marullo², Giuliano Armano²

Affiliations

¹ Institute of Biomedical Technologies - National Research Council of Italy, Segrate (Mi), Italy.
² Department of Mathematics and Computer science - University of Cagliari, Cagliari, Italy.

PMID: 36852030
PMCID: PMC9958288
DOI: 10.1016/j.heliyon.2023.e13368

Review

Framing Apache Spark in life sciences

Andrea Manconi et al. Heliyon. 2023.

. 2023 Feb 9;9(2):e13368.

doi: 10.1016/j.heliyon.2023.e13368. eCollection 2023 Feb.

Authors

Andrea Manconi¹, Matteo Gnocchi¹, Luciano Milanesi¹, Osvaldo Marullo², Giuliano Armano²

Affiliations

¹ Institute of Biomedical Technologies - National Research Council of Italy, Segrate (Mi), Italy.
² Department of Mathematics and Computer science - University of Cagliari, Cagliari, Italy.

PMID: 36852030
PMCID: PMC9958288
DOI: 10.1016/j.heliyon.2023.e13368

Abstract

Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities.

Keywords: 00-01; 99-00; Apache Spark; Big data; HPC; Parallel computing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Snapshot of a Spark application.

See this image and copyright information in PMC

Cited by

Mechanisms and technologies in cancer epigenetics.
Sherif ZA, Ogunwobi OO, Ressom HW. Sherif ZA, et al. Front Oncol. 2025 Jan 7;14:1513654. doi: 10.3389/fonc.2024.1513654. eCollection 2024. Front Oncol. 2025. PMID: 39839798 Free PMC article. Review.

References

1. https://www.embl.org/files/wp-content/uploads/EMBL-EBI_Annual-Report_202... Embl-ebi annual report 2020. URL.
1. Unravelling the Human Genome–Phenome Relationship Using Phenome-Wide Association StudiesNat. Rev. Genet. 2016;17:129–145. - PubMed
1. Atasoy H., Greenwood B.N., McCullough J.S. 2019. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Tech. Rep. - PubMed
1. Zhou L., Pan S., Wang J., Vasilakos A. Machine learning on big data: opportunities and challenges. Neurocomputing. 2017;237:350–361. doi: 10.1016/j.neucom.2017.01.026. - DOI
1. Parliament, Eurpean Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) Off. J. Eur. Union. 2016;119(1)

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Framing Apache Spark in life sciences

Affiliations

Framing Apache Spark in life sciences

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

LinkOut - more resources

Full Text Sources