Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 12;3(8):100570.
doi: 10.1016/j.patter.2022.100570.

The All of Us Research Program: Data quality, utility, and diversity

Collaborators, Affiliations

The All of Us Research Program: Data quality, utility, and diversity

Andrea H Ramirez et al. Patterns (N Y). .

Abstract

The All of Us Research Program seeks to engage at least one million diverse participants to advance precision medicine and improve human health. We describe here the cloud-based Researcher Workbench that uses a data passport model to democratize access to analytical tools and participant information including survey, physical measurement, and electronic health record (EHR) data. We also present validation study findings for several common complex diseases to demonstrate use of this novel platform in 315,000 participants, 78% of whom are from groups historically underrepresented in biomedical research, including 49% self-reporting non-White races. Replication findings include medication usage pattern differences by race in depression and type 2 diabetes, validation of known cancer associations with smoking, and calculation of cardiovascular risk scores by reported race effects. The cloud-based Researcher Workbench represents an important advance in enabling secure access for a broad range of researchers to this large resource and analytical tools.

Keywords: All of Us Research Program; cloud-based analytics; cohort study; electronic health records; precision medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of data types included in the beta-release curated data repository (A) Growth trajectory of participant data types after enrollment. Survey part 1 (green) includes “the Basics,” “Lifestyle,” and “Overall Health” surveys; survey part 2 (pink) includes “Personal Medical History,” “Health Care Access and Utilization,” and “Family Medical History.” Physical measurement accrual is shown in red, and the COVID-19 Participant Experience (COPE) survey is shown in purple. Note that the flattening is artificial due to the random date shift introduced to protect participant privacy. (B) Historical availability of participants’ electronic health record (EHR), survey, and device data.
Figure 2
Figure 2
UBR metrics Depiction of the proportion of participants that are underrepresented in biomedical research (UBR) based on program definitions. A participant is included in the overall category if they meet at least one criterion among the race/ethnicity, income, age, sexual orientation, education, gender identity, and sex at birth designations. The sexual and gender minorities category shows aggregates of any participant with a UBR response to questions on sexual orientation or gender identity or sex at birth. GED, General Education Development (i.e., high school diploma or equivalent).
Figure 3
Figure 3
Medication sequencing for participants who have diabetes and depression grouped by race (A) Anti-diabetic medication sequences for White participants. (B) Anti-diabetic medication sequences for non-White participants. (C) Antidepressant medication sequences for White participants. (D) Antidepressant medication sequences for non-White participants. (E) Percentage of White participants who were prescribed one medication that is the most common one from years 2000–2018. (F) Percentage of non-White participants who were prescribed one medication that is the most common one from years 2000–2018. The difference in counts of first anti-diabetic in (A) and (B) and the counts of first antidepressants in (C) and (D) for each medication between White and non-White participants was significant (p by chi-square was <0.05).
Figure 4
Figure 4
Cancer PheWAS ever-smoking EHR and survey comparison (A) Manhattan plot for Cancer PheWAS using EHR ever smoking as exposure. Results are the −log10 (p value) of the corresponding logistic regression adjusted for age at last relevant EHR code, sex at birth, race and ethnicity from surveys, EHR length, and number of unique billing codes per record. Up arrows indicate non-protective associations, and down arrows indicate protective ones. Phenotype labels are given to the top ten phenotypes based on magnitude of effect size for both protective and non-protective effects. (B) Manhattan plot for cancer PheWAS using survey ever smoking as exposure, with the same presentations as (A). (C) Comparison of survey smoking-regression estimates (colored in blue) to EHR smoking-regression estimates (colored in red) for cancer outcomes. (D) PheWAS EHR ever-smoking (dark blue) and survey ever-smoking (light blue) effect sizes and confidence intervals compared with published meta-analyses (orange). Estimates are presented on the natural log ratio scale (odds ratio [OR] or risk ratio [RR]). Estimates below the horizontal line represent protective effects, and estimates above the line represent non-protective effects. Each meta-analysis plot point shape represents whether the effect size from the literature was an OR, HR, or an RR, recognizing that RR and ORs are not directly comparable except in the case of rare disease.
Figure 5
Figure 5
Baseline cardiovascular disease risk calculations (A) Baseline 10 year ASCVD cardiovascular disease risk calculations (%). A histogram of the cardiovascular disease risk score for participants with necessary measurements grouped by race group into White, African American, and other. The difference among the cardiovascular risk scores across the three race groups was statistically significant (p value for the Kruskal-Wallis H test was 0). Mann-Whitney p value was <0.001 when comparing the risk scores for White versus African American participants, other versus African American, and White versus other. (B) Comparing the percentage of All of Us participants to the US population in each ASCVD risk category as published in ACC/AHA guidelines. The risk score for US population was calculated by applying the pooled cohort equations (i.e., ASCVD score) to the National Health and Nutrition Examinations Surveys. (C) Comparing the percentage of All of Us participants in each race group with the US population in each ASCVD risk category as published in ACC/AHA guidelines. The risk score for US population was calculated by applying the pooled cohort equations (i.e., ASCVD score) to the National Health and Nutrition Examinations Surveys.

References

    1. Rutter J.L., Philippakis A., Jenkins G., Smoller J.W., Jenkins G., Dishman E., All of Us Research Program Investigators The “All of Us” Research Program. N. Engl. J. Med. 2019;381:668–676. doi: 10.1056/NEJMsr1809937. - DOI - PMC - PubMed
    1. Cronin R.M., Jerome R.N., Mapes B., Andrade R., Johnston R., Ayala J., Schlundt D., Bonnet K., Kripalani S., Goggins K., et al. Development of the initial surveys for the all of us research program. Epidemiology. 2019;30:597–608. doi: 10.1097/EDE.0000000000001028. - DOI - PMC - PubMed
    1. Ramirez A.H., Gebo K.A., Harris P.A. Progress with the all of us research program: opening access for researchers. JAMA. 2021;325:2441–2442. doi: 10.1001/jama.2021.7702. - DOI - PubMed
    1. Devaney S. All of us. Nature. 2019;576:S14–S17. doi: 10.1038/d41586-019-03717-8. - DOI
    1. All of Us Research Hub . 2020. Researcher Workbench.https://www.researchallofus.org/workbench/