Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;627(8003):340-346.
doi: 10.1038/s41586-023-06957-x. Epub 2024 Feb 19.

Genomic data in the All of Us Research Program

Collaborators

Genomic data in the All of Us Research Program

All of Us Research Program Genomics Investigators. Nature. 2024 Mar.

Abstract

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics1-4. The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health5,6. Here we describe the programme's genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.

PubMed Disclaimer

Conflict of interest statement

D.M.M., G.A.M., E.V., K.W., J.H., H.D., C.L.K., M.M., S.D., Z.K., E. Boerwinkle and R.A.G. declare that Baylor Genetics is a Baylor College of Medicine affiliate that derives revenue from genetic testing. Eric Venner is affiliated with Codified Genomics, a provider of genetic interpretation. E.E.E. is a scientific advisory board member of Variant Bio, Inc. A.G.B. is a scientific advisory board member of TenSixteen Bio. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Summary of All of Us data resources.
a, The All of Us Research Hub contains a publicly accessible Data Browser for exploration of summary phenotypic and genomic data. The Researcher Workbench is a secure cloud-based environment of participant-level data in a Controlled Tier that is widely accessible to researchers. b, All of Us participants have rich phenotype data from a combination of physical measurements, survey responses, EHRs, wearables and genomic data. Dots indicate the presence of the specific data type for the given number of participants. c, Overall summary of participants under-represented in biomedical research (UBR) with data available in the Controlled Tier. The All of Us logo in a is reproduced with permission of the National Institutes of Health’s All of Us Research Program.
Fig. 2
Fig. 2. Genetic ancestry in All of Us.
a,b, Uniform manifold approximation and projection (UMAP) representations of All of Us WGS PCA data with self-described race (a) and ethnicity (b) labels. c, Proportion of genetic ancestry per individual in six distinct and coherent ancestry groups defined by Human Genome Diversity Project and 1000 Genomes samples.
Fig. 3
Fig. 3. All of Us LDL-C GWAS.
Manhattan plot demonstrating robust replication of 20 well-established LDL-C genetic loci among 91,749 individuals with 1 or more LDL-C measurements. The red horizontal line denotes the genome wide significance threshold of P = 5 × 10–8. Inset, effect estimate (β) comparison between NHLBI TOPMed LDL-C GWAS (x axis) and All of Us LDL-C GWAS (y axis) for the subset of 194 independent variants clumped (window 250 kb, r2 0.5) that reached genome-wide significance in NHLBI TOPMed.
Fig. 4
Fig. 4. Phenome-wide associations of the Duffy blood group locus (rs2814778, ACKR1).
Results of genetic-ancestry-stratified phenome-wide association analysis among unrelated individuals highlighting ancestry-specific disease associations across the four most common genetic ancestries of participant. Bonferroni-adjusted phenome-wide significance threshold (<2.88 × 10−5) is plotted as a red horizontal line. AFR (n = 34,037, minor allele fraction (MAF) 0.82); AMR (n = 28,901, MAF 0.10); EAS (n = 32,55, MAF 0.003); EUR (n = 101,613, MAF 0.007).
Extended Data Fig. 1
Extended Data Fig. 1. Historic availability of EHR records in All of Us v7 Controlled Tier Curated Data Repository (N = 413,457).
For better visibility, the plot shows growth starting in 2010.
Extended Data Fig. 2
Extended Data Fig. 2. Overview of the Genomic Data Curation Pipeline for WGS samples.
The Data and Research Center (DRC) performs additional single sample quality control (QC) on the data as it arrives from the Genome Centers. The variants from samples that pass this QC are loaded into the Genomic Variant Store (GVS), where we jointly call the variants and apply additional QC. We apply a joint call set QC process, which is stored with the call set. The entire joint call set is rendered as a Hail Variant Dataset (VDS), which can be accessed from the analysis notebooks in the Researcher Workbench. Subsections of the genome are extracted from the VDS and rendered in different formats with all participants. Auxiliary data can also be accessed through the Researcher Workbench. This includes variant functional annotations, joint call set QC results, predicted ancestry, and relatedness. Auxiliary data are derived from GVS (arrow not shown) and the VDS. The Cohort Builder directly queries GVS when researchers request genomic data for subsets of samples. Aligned reads, as cram files, are available in the Researcher Workbench (not shown). The graphics of the dish, gene and computer and the All of Us logo are reproduced with permission of the National Institutes of Health’s All of Us Research Program.
Extended Data Fig. 3
Extended Data Fig. 3. Proportion of allelic frequencies (AF), stratified by computed ancestry with over 10,000 participants.
Bar counts are not cumulative (eg, “pop AF < 0.01” does not include “pop AF < 0.001”).
Extended Data Fig. 4
Extended Data Fig. 4. Distribution of pathogenic, and likely pathogenic ClinVar variants.
Stratified by ancestry filtered to only those variants that are found in allele count (AC) < 40 individuals for 245,388 short read WGS samples.
Extended Data Fig. 5
Extended Data Fig. 5. Ancestry specific HLA-DQB1 (rs9273363) locus associations in 231,442 unrelated individuals.
Phenome-wide (PheWAS) associations highlight ancestry specific consequences across ancestries.
Extended Data Fig. 6
Extended Data Fig. 6. Ancestry specific TCF7L2 (rs7903146) locus associations in 231,442 unrelated individuals.
Phenome-wide (PheWAS) associations highlight diabetic consequences across ancestries.

References

    1. The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. doi: 10.1038/s41586-019-1879-7. - DOI - PMC - PubMed
    1. Wojcik GL, et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. - DOI - PMC - PubMed
    1. Lewis ACF, et al. Getting genetic ancestry right for science and society. Science. 2022;376:250–252. doi: 10.1126/science.abm7530. - DOI - PMC - PubMed
    1. All of Us Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 2019;381:668–676. doi: 10.1056/NEJMsr1809937. - DOI - PMC - PubMed

MeSH terms