Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 3;16(1):4123.
doi: 10.1038/s41467-025-59351-8.

Genetic ancestry and population structure in the All of Us Research Program cohort

Collaborators, Affiliations

Genetic ancestry and population structure in the All of Us Research Program cohort

Shivam Sharma et al. Nat Commun. .

Abstract

We analyzed participant genomic variant data to characterize population structure and genetic ancestry for the All of Us cohort (n = 297,549). There is substantial population structure in the cohort, with clusters of closely related participants interspersed among less related individuals. Participants show diverse genetic ancestry, with major contributions from European (66.4%), African (19.5%), Asian (7.6%), and American (6.3%) continental ancestry components. Participant genetic similarity clusters show group-specific ancestry, with distinct patterns of continental and subcontinental ancestry among groups. African and American ancestry are enriched in the southeast and southwest regions of the country, respectively, whereas European ancestry is more evenly distributed across the US. The diversity of All of Us participants' genetic ancestry is negatively correlated with age; younger participants show higher levels of genetic admixture compared to older participants. Our results underscore the ancestral genetic diversity of the All of Us cohort, a crucial prerequisite for genomic health equity.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Population structure.
Genomic PCA for All of Us participants. Left panels show PC1 versus PC2 comparisons, and right panels show PC1 versus PC3 comparisons, with the percent of variance explained by each PC shown. a Participants color-coded by the number of close neighbors as defined by Euclidean distance < 0.1 in PCs 1–5. b Kernel density estimation with peaks showing high-density clusters of participants in PC space. c High-density clusters of genetically similar participants are shown as groups 1–7.
Fig. 2
Fig. 2. Continental genetic ancestry.
a Genomic PCA with All of Us participants shown in gray and global reference population samples color-coded as shown in the key. Left panels show PC1 versus PC2 comparisons, and right panels show PC1 versus PC3 comparisons, with the percent of variance explained by each PC shown. b Genetic ancestry proportions for All of Us participants stratified by the genetic similarity groups shown in Fig. 1c. Average ancestry proportions are shown above each group, and numbers of participants are shown below each group. The remaining participants are individuals who did not fall into a dense PCA cluster.
Fig. 3
Fig. 3. Subcontinental genetic ancestry.
Subcontinental genetic ancestry proportions for All of Us participants from (a) African, (b) East Asian, (c) South Asian, and (d) European continental ancestry groups. Subcontinental groups (regions) for each continental ancestry group are color-coded as shown.
Fig. 4
Fig. 4. Genetic ancestry by geography.
Genetic ancestry proportions are shown for All of Us participants sampled from the fifty US states and Puerto Rico. a All participants and ancestry components. b Non-European genetic ancestry proportions for all individuals with <90% European ancestry. The results for states shaded in gray are suppressed owing to <20 participants with <90% European ancestry.
Fig. 5
Fig. 5. Genetic admixture by age.
Genetic admixture entropy (y-axis) against participant age (x-axis). Ages shown in single year bins, where each bin had at least 1000 participants (24–89 years), with average and 95% CI values shown. Linear regression trend line (black) shown with 95% CI shaded (gray). The linear regression adjusted R2 and its P value are shown for n = 66 bins.

References

    1. Bustamante, C. D., Burchard, E. G. & De la Vega, F. M. Genomics for the world. Nature475, 163–165 (2011). - PMC - PubMed
    1. Petrovski, S. & Goldstein, D. B. Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biol.17, 157 (2016). - PMC - PubMed
    1. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature538, 161–164 (2016). - PMC - PubMed
    1. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet51, 584–591 (2019). - PMC - PubMed
    1. All of Us Research Program, I. et al. The “All of Us” Research Program. N. Engl. J. Med.381, 668–676 (2019). - PMC - PubMed

Grants and funding

LinkOut - more resources