. 2020 Nov 30;21(1):549.

doi: 10.1186/s12859-020-03879-7.

Isabl Platform, a digital biobank for processing multimodal patient data

Juan S Medina-Martínez^{1

2}, Juan E Arango-Ossa¹, Max F Levine¹, Yangyu Zhou¹, Gunes Gundem¹, Andrew L Kung¹, Elli Papaemmanuil³

Affiliations

¹ Memorial Sloan Kettering Cancer Center, New York, NY, USA.
² Isabl Inc., New York, NY, USA.
³ Memorial Sloan Kettering Cancer Center, New York, NY, USA. papaemme@mskcc.org.

PMID: 33256603
PMCID: PMC7708092
DOI: 10.1186/s12859-020-03879-7

Isabl Platform, a digital biobank for processing multimodal patient data

Juan S Medina-Martínez et al. BMC Bioinformatics. 2020.

. 2020 Nov 30;21(1):549.

doi: 10.1186/s12859-020-03879-7.

Authors

Juan S Medina-Martínez^{1

2}, Juan E Arango-Ossa¹, Max F Levine¹, Yangyu Zhou¹, Gunes Gundem¹, Andrew L Kung¹, Elli Papaemmanuil³

Affiliations

¹ Memorial Sloan Kettering Cancer Center, New York, NY, USA.
² Isabl Inc., New York, NY, USA.
³ Memorial Sloan Kettering Cancer Center, New York, NY, USA. papaemme@mskcc.org.

PMID: 33256603
PMCID: PMC7708092
DOI: 10.1186/s12859-020-03879-7

Abstract

Background: The widespread adoption of high throughput technologies has democratized data generation. However, data processing in accordance with best practices remains challenging and the data capital often becomes siloed. This presents an opportunity to consolidate data assets into digital biobanks-ecosystems of readily accessible, structured, and annotated datasets that can be dynamically queried and analysed.

Results: We present Isabl, a customizable plug-and-play platform for the processing of multimodal patient-centric data. Isabl's architecture consists of a relational database (Isabl DB), a command line client (Isabl CLI), a RESTful API (Isabl API) and a frontend web application (Isabl Web). Isabl supports automated deployment of user-validated pipelines across the entire data capital. A full audit trail is maintained to secure data provenance, governance and ensuring reproducibility of findings.

Conclusions: As a digital biobank, Isabl supports continuous data utilization and automated meta analyses at scale, and serves as a catalyst for research innovation, new discoveries, and clinical translation.

Keywords: Analysis information management system; Data processing; Genomics; Image processing; Multimodal data; Next generation sequencing; Software engineering.

PubMed Disclaimer

Conflict of interest statement

JSM, EP, and AK are founders of Isabl a whole genome analytics company.

Figures

**Fig. 1**
Schematic representation of Isabl's microservice architecture. Isabl DB provides a patient centric relational model for the integration of multimodal data types (i.e., genomic, imaging) and their corresponding relationships (individual, sample, aliquot, experiment, analyses). Isabl Web facilitates visualization of results and metadata in a single page application. Isabl API powers the linkage to other institutional information systems and is agnostic to data storage technologies and computing environments, ensuring metadata is accessible even when the data is no longer available (FAIR A2). Isabl CLI is a Command Line Client used to process and manage digital assets across computing paradigms (i.e. cloud, cluster). Arrow connectors indicate database relationships between Isabl schemas, dashed lines indicate metadata transfer through the internet, solid line indicates a data link between the data lake and the web server (e.g. sshfs, s3fs, https)

**Fig. 2**
Isabl's relational model maps workflows for data provenance (e.g. Individuals, Samples, Experiments), processing (e.g. Applications, Analyses), and governance (e.g. Projects, Users). a An individual-centric model facilitates the tracking of analyses conducted on experimental data obtained from related samples. Analyses are results of analytical workflows, or applications. Experiments are analyzed together and grouped in projects. Additionally, schemas to track metadata for diseases, experimental techniques, data generation platforms, and analyses cohorts are also provided. Lines with one circle represent foreing keys, whilst lines with two circles represent many to many relationships. b A brief description of these schemas with examples

**Fig. 3**
Isabl Web is a Single Page Application (SPA) organized in interactive panels (https://demo.isabl.io). a Example of sample level metadata, to include sample ID, corresponding individual ID, experimental ID, species, gender, center, data generating platform, experimental technique, disease state at the time of sampling, institutional database integrations (i.e. RedCap) and version of corresponding data genome assembly. Metadata fields are flexible and customizable. b Tree view representation of an individual assets (samples, aliquots, experiments). Users can dynamically explore metadata by clicking the different nodes (i.e. from samples, to experiments, to all available analyses under any node). c The Analysis Panel indicates execution status, version, run time, storage usage, linked experiments and offers quick access to a selected set of results (e.g. BAM files with https://github.com/igvteam/igv.js, images, log files, tables)

**Fig. 4**
Isabl applications enable systematic processing of experimental data. a Guided by metadata, Isabl applications construct, validate, and deploy computing commands across experiments. Applications differ from Workflow Management Systems in that they don't *execute* the analytical logic but *construct* and *submit* a command. b Isabl applications can be *assembly aware*, this means that they can be versioned not only as a function of their name, but also as a function of the genome assembly they are configured for. This is important because NGS results are comparable when produced with the same genome version. The unique combination of *targets* and *references*, such as tumor-normal pairs, results in *analyses*. The figure panel illustrates applications with different experimental designs, such as paired analyses, multi-targets, single-target, etc. Importantly, applications are agnostic to the underlying tool or pipeline being executed

**Fig. 5**
Isabl fosters autonomy, automation, audit trail, and scalable deployment of data processing tools in a system-wide approach. a Panel showcases exponential increase in data generation (colored lines indicate categories for registered applications, projects, individuals, experiments, and analyses output). b Isabl facilitated the registration and processing of + 35K patients from the MSK-IMPACT cohort using a novel tool. Metadata was ingested with Isabl API in less than an hour, whilst + 35K analyses were submitted with a single command and processed in three days

**Fig. 6**
Isabl supports the implementation of production-ready workflows. The *no-click genome* has completed reports at a rate of 4.5 ± 2 days / report (mean ± standard deviation; n = 20; mean depth coverage 80 ± 20) using a 3000-cores High Performance Computing multi-user cluster. Processing duration is primarily driven by the longest-running application at each parallel block as well as compute availability (i.e. cluster congestion)

See this image and copyright information in PMC

References

1. Torkamani A, Andersen KG, Steinhubl SR, Topol EJ. High-definition medicine. Cell. 2017;170:828–843. doi: 10.1016/j.cell.2017.08.007. - DOI - PMC - PubMed
1. Riba M, Sala C, Toniolo D, Tonon G. Big data in medicine, the present and hopefully the future. Front Med. 2019;6:263. doi: 10.3389/fmed.2019.00263. - DOI - PMC - PubMed
1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
1. Filipp FV. Opportunities for artificial intelligence in advancing precision medicine. Curr Genet Med Rep. 2019;7:208–213. doi: 10.1007/s40142-019-00177-4. - DOI - PMC - PubMed
1. Griffith M, et al. Genome modeling system: a knowledge management platform for genomics. PLoS Comput Biol. 2015;11:e1004274. doi: 10.1371/journal.pcbi.1004274. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Isabl Platform, a digital biobank for processing multimodal patient data

Affiliations

Isabl Platform, a digital biobank for processing multimodal patient data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous