Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6:14:giaf038.
doi: 10.1093/gigascience/giaf038.

Overture: an open-source genomics data platform

Affiliations

Overture: an open-source genomics data platform

Mitchell Shiell et al. Gigascience. .

Abstract

Background: Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.

Findings: Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.

Conclusions: By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.

Keywords: data management; genomics; open-science; open-source; research software.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
Platform components overview: on the front-end, Stage [24] provides the basic user interface (UI), including navigation menus, as well as data exploration, login, and profile pages. Arranger’s [25] library of search UI components then integrates with Stage to offer a configurable search facet panel, data table, and filter summary panel. Login and profile pages integrate with Keycloak [26] or Ego [27], which provides authentication and authorization for users and applications. Behind the scenes, Song [28] and Score [29] facilitate data management, retrieval, and submission. Score transfers large genomic files to and from S3-compatible object storage, while Song stores and handles the files’ metadata. These databases are indexed by Maestro [30] into unified Elasticsearch [31] file-centric and analysis-centric indices. Arranger then uses these to produce a GraphQL [32] search API that connects with its front-end library components on the data exploration page. Combined together, these services broadly enable the secure and scalable reuse of genomics data.
Figure 2:
Figure 2:
Data retrieval workflow: users first filter data via Arranger’s search components in the Stage UI’s data explorer. Once they have selected a subset, they then download a “file manifest” from the download dropdown. To access Song and Score data, users log in through Stage’s auth integration and obtain their API key from the profile page. This API key is provided when installing the score client. Finally, files are downloaded to the user’s device using the Score client’s download command, specifying the file manifest and desired output directory.
Figure 3:
Figure 3:
Data submission workflow: Overture’s submission process enhances data integrity with data tracking and data model adherence. It involves organizing metadata files, converting them to JSON, and uploading via the Song Client for validation. Successful submissions receive an auto-generated analysis ID. File data are then uploaded using Song and Score clients, generating a file manifest linked to the metadata. All data start unpublished and are managed through Song’s publication controls for coordinated data releases.

References

    1. Gates AJ, Gysi DM, Kellis M, et al. A wealth of discovery built on the Human Genome Project—by the numbers. Nature. 2021;590:212–15. 10.1038/d41586-021-00314-6. - DOI - PubMed
    1. Stephens ZD, Lee SY, Faghri F, et al. Big data: astronomical or genomical?. PLoS Biol. 2015;13:e1002195. 10.1371/journal.pbio.1002195. - DOI - PMC - PubMed
    1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinf Biol Insights. 2020;14:117793221989905. 10.1177/1177932219899051. - DOI - PMC - PubMed
    1. Sharma A, Lysenko A, Jia S, et al. Advances in AI and machine learning for predictive medicine. J Hum Genet. 2024;69:487–97.10.1038/s10038-024-01231-y. - DOI - PMC - PubMed
    1. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. 10.1038/sdata.2016.18. - DOI - PMC - PubMed

LinkOut - more resources