Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Aug 10:6:443-464.
doi: 10.1146/annurev-biodatasci-122120-104825.

The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research

Affiliations
Review

The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research

Kelsey R Mayo et al. Annu Rev Biomed Data Sci. .

Abstract

The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.

Keywords: data ecosystem; data integration; data privacy; diversity; electronic health records; precision medicine.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the All of Us data ecosystem. (a) Participants enroll, consent, answer questionnaires, and provide additional data through All of Us Participant Portals, managed by the Participant Technologies Systems Center and The Participant Center. (b) Data are stored in a central, secure, Raw Data Repository (RDR) at the Data and Research Center (DRC). Program staff at Healthcare Provider Organization (HPO) partners (c.) leverage DRC’s HealthPro application to complete baseline assessments, including program physical measurements and biospecimen collection, and (d.) contribute data from local clinical systems via the Electronic Health Record (EHR) data pipeline. Genomic analysis of participant biospecimen occurs via (e.) a genomic data pipeline in collaboration with the All of Us Biobank and Genomics Partners. (f.) Summary participant and additional operational data from the RDR are aggregated into a Program Data Repository to power program-staff-facing analytics and dashboards. (g.) Return of genetic results to participants is facilitated by the All of Us Genetic Counseling Resource. In parallel, participant data from the RDR is (h.) routed through a curation pipeline to create a tiered (i.) Curated Data Repository (CDR). Public CDR data are made available via the (j.) the All of Us Research Hub’s public Data Browser, while participant-level data in the Registered and Controlled Tiers are made available via the Research Hub’s secure analysis Trusted Research Environment (TRE), the (k.) Researcher Workbench.
Figure 2.
Figure 2.
All of Us data are organized into tables according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), when possible. Self-reported demographic data from the Basics survey populates the person table. Other data obtained from surveys are found in the observation table. Program physical measurements as well as Electronic Health Record (EHR) measurements populate the measurement table. EHR data concerning visits, procedures, drugs, and conditions are arranged into their respective tables. All tables relate to the person table and the tables containing procedure, drug, condition, and measurement data relate to the visit occurrence table.
Figure 3.
Figure 3.
All of Us participant data are collected into the Raw Data Repository (RDR), then harmonized, organized, and further processed into a Curated Data Repository (CDR). This repository is structured into three tiers of data with corresponding tiers of access requirements: Public (no login required), Registered (login required), and Controlled (login and additional approval required).
Figure 4.
Figure 4.
Researchers wishing to analyze participant-level data must follow a 6-step access process: (1) explore data and policies on the Research Hub’s public website, (2) check that their institution has signed a Data Use and Registration Agreement. The indicated period of institutional contracting is only required if a researcher’s institution does not have an agreement in place. Researchers may then (3) create a Researcher Workbench account, (4) verify their identity using Login.gov, (5) complete required responsible and ethical research training, and (6) sign an individual data user code of conduct. Upon approval, researchers are granted a “data passport” enabling them to create a workspace to access and analyze All of Us participant data within the Researcher Workbench.
Figure 5.
Figure 5.
The 20 most common areas of researcher investigation according to medical terms in the projects’ self-reported descriptive summaries.

References

    1. The Cancer Genome Atlas Program - NCI. 2018. URL: https://www.cancer.gov/about-nci/organization/ccg/research/structural-ge... (Accessed 30 September 2022).
    1. ENCODE. n.d. URL: https://www.encodeproject.org/ (Accessed 5 October 2022).
    1. Human Genome Diversity Project. n.d. URL: https://hagsc.org/hgdp/ (Accessed 5 October 2022).
    1. Reuter MS, Walker S, Thiruvahindrapuram B, Whitney J, Cohn I, Sondheimer N, et al. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants. CMAJ 2018;190:E126–36. 10.1503/cmaj.171151. - DOI - PMC - PubMed
    1. CHARGE Consortium. n.d. URL: https://www.chargeconsortium.com/ (Accessed 5 October 2022).

Publication types

LinkOut - more resources