Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 30;21(1):549.
doi: 10.1186/s12859-020-03879-7.

Isabl Platform, a digital biobank for processing multimodal patient data

Affiliations

Isabl Platform, a digital biobank for processing multimodal patient data

Juan S Medina-Martínez et al. BMC Bioinformatics. .

Abstract

Background: The widespread adoption of high throughput technologies has democratized data generation. However, data processing in accordance with best practices remains challenging and the data capital often becomes siloed. This presents an opportunity to consolidate data assets into digital biobanks-ecosystems of readily accessible, structured, and annotated datasets that can be dynamically queried and analysed.

Results: We present Isabl, a customizable plug-and-play platform for the processing of multimodal patient-centric data. Isabl's architecture consists of a relational database (Isabl DB), a command line client (Isabl CLI), a RESTful API (Isabl API) and a frontend web application (Isabl Web). Isabl supports automated deployment of user-validated pipelines across the entire data capital. A full audit trail is maintained to secure data provenance, governance and ensuring reproducibility of findings.

Conclusions: As a digital biobank, Isabl supports continuous data utilization and automated meta analyses at scale, and serves as a catalyst for research innovation, new discoveries, and clinical translation.

Keywords: Analysis information management system; Data processing; Genomics; Image processing; Multimodal data; Next generation sequencing; Software engineering.

PubMed Disclaimer

Conflict of interest statement

JSM, EP, and AK are founders of Isabl a whole genome analytics company.

Figures

Fig. 1
Fig. 1
Schematic representation of Isabl's microservice architecture. Isabl DB provides a patient centric relational model for the integration of multimodal data types (i.e., genomic, imaging) and their corresponding relationships (individual, sample, aliquot, experiment, analyses). Isabl Web facilitates visualization of results and metadata in a single page application. Isabl API powers the linkage to other institutional information systems and is agnostic to data storage technologies and computing environments, ensuring metadata is accessible even when the data is no longer available (FAIR A2). Isabl CLI is a Command Line Client used to process and manage digital assets across computing paradigms (i.e. cloud, cluster). Arrow connectors indicate database relationships between Isabl schemas, dashed lines indicate metadata transfer through the internet, solid line indicates a data link between the data lake and the web server (e.g. sshfs, s3fs, https)
Fig. 2
Fig. 2
Isabl's relational model maps workflows for data provenance (e.g. Individuals, Samples, Experiments), processing (e.g. Applications, Analyses), and governance (e.g. Projects, Users). a An individual-centric model facilitates the tracking of analyses conducted on experimental data obtained from related samples. Analyses are results of analytical workflows, or applications. Experiments are analyzed together and grouped in projects. Additionally, schemas to track metadata for diseases, experimental techniques, data generation platforms, and analyses cohorts are also provided. Lines with one circle represent foreing keys, whilst lines with two circles represent many to many relationships. b A brief description of these schemas with examples
Fig. 3
Fig. 3
Isabl Web is a Single Page Application (SPA) organized in interactive panels (https://demo.isabl.io). a Example of sample level metadata, to include sample ID, corresponding individual ID, experimental ID, species, gender, center, data generating platform, experimental technique, disease state at the time of sampling, institutional database integrations (i.e. RedCap) and version of corresponding data genome assembly. Metadata fields are flexible and customizable. b Tree view representation of an individual assets (samples, aliquots, experiments). Users can dynamically explore metadata by clicking the different nodes (i.e. from samples, to experiments, to all available analyses under any node). c The Analysis Panel indicates execution status, version, run time, storage usage, linked experiments and offers quick access to a selected set of results (e.g. BAM files with https://github.com/igvteam/igv.js, images, log files, tables)
Fig. 4
Fig. 4
Isabl applications enable systematic processing of experimental data. a Guided by metadata, Isabl applications construct, validate, and deploy computing commands across experiments. Applications differ from Workflow Management Systems in that they don't execute the analytical logic but construct and submit a command. b Isabl applications can be assembly aware, this means that they can be versioned not only as a function of their name, but also as a function of the genome assembly they are configured for. This is important because NGS results are comparable when produced with the same genome version. The unique combination of targets and references, such as tumor-normal pairs, results in analyses. The figure panel illustrates applications with different experimental designs, such as paired analyses, multi-targets, single-target, etc. Importantly, applications are agnostic to the underlying tool or pipeline being executed
Fig. 5
Fig. 5
Isabl fosters autonomy, automation, audit trail, and scalable deployment of data processing tools in a system-wide approach. a Panel showcases exponential increase in data generation (colored lines indicate categories for registered applications, projects, individuals, experiments, and analyses output). b Isabl facilitated the registration and processing of + 35K patients from the MSK-IMPACT cohort using a novel tool. Metadata was ingested with Isabl API in less than an hour, whilst + 35K analyses were submitted with a single command and processed in three days
Fig. 6
Fig. 6
Isabl supports the implementation of production-ready workflows. The no-click genome has completed reports at a rate of 4.5 ± 2 days / report (mean ± standard deviation; n = 20; mean depth coverage 80 ± 20) using a 3000-cores High Performance Computing multi-user cluster. Processing duration is primarily driven by the longest-running application at each parallel block as well as compute availability (i.e. cluster congestion)

References

    1. Torkamani A, Andersen KG, Steinhubl SR, Topol EJ. High-definition medicine. Cell. 2017;170:828–843. doi: 10.1016/j.cell.2017.08.007. - DOI - PMC - PubMed
    1. Riba M, Sala C, Toniolo D, Tonon G. Big data in medicine, the present and hopefully the future. Front Med. 2019;6:263. doi: 10.3389/fmed.2019.00263. - DOI - PMC - PubMed
    1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
    1. Filipp FV. Opportunities for artificial intelligence in advancing precision medicine. Curr Genet Med Rep. 2019;7:208–213. doi: 10.1007/s40142-019-00177-4. - DOI - PMC - PubMed
    1. Griffith M, et al. Genome modeling system: a knowledge management platform for genomics. PLoS Comput Biol. 2015;11:e1004274. doi: 10.1371/journal.pcbi.1004274. - DOI - PMC - PubMed