Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 19;385(6706):eadj1182.
doi: 10.1126/science.adj1182. Epub 2024 Jul 19.

Diversity and scale: Genetic architecture of 2068 traits in the VA Million Veteran Program

Anurag Verma #  1   2   3 Jennifer E Huffman #  4   5   6 Alex Rodriguez #  7 Mitchell Conery #  8 Molei Liu #  9 Yuk-Lam Ho  4 Youngdae Kim  10 David A Heise  11 Lindsay Guare  2 Vidul Ayakulangara Panickan  12 Helene Garcon  4 Franciel Linares  13 Lauren Costa  14 Ian Goethert  15 Ryan Tipton  16 Jacqueline Honerlaw  4 Laura Davies  17 Stacey Whitbourne  6   14   18 Jeremy Cohen  11 Daniel C Posner  4 Rahul Sangar  14 Michael Murray  14 Xuan Wang  12   19 Daniel R Dochtermann  20 Poornima Devineni  20 Yunling Shi  20 Tarak Nath Nandi  7 Themistocles L Assimes  21 Charles A Brunette  6   22 Robert J Carroll  23 Royce Clifford  24   25 Scott Duvall  26   27 Joel Gelernter  28   29 Adriana Hung  30 Sudha K Iyengar  31 Jacob Joseph  32   33 Rachel Kember  34   35 Henry Kranzler  34   35 Colleen M Kripke  2 Daniel Levey  28   36 Shiuh-Wen Luoh  37   38 Victoria C Merritt  24 Cassie Overstreet  28 Joseph D Deak  39   40 Struan F A Grant  41   42   43   44 Renato Polimanti  39 Panos Roussos  45 Gabrielle Shakt  1   46 Yan V Sun  47 Noah Tsao  1   46 Sanan Venkatesh  45 Georgios Voloudakis  45 Amy Justice  36   48   49 Edmon Begoli  50 Rachel Ramoni  51 Georgia Tourassi  52 Saiju Pyarajan  20 Philip Tsao  21   53 Christopher J O'Donnell  54 Sumitra Muralidhar  51 Jennifer Moser  51 Juan P Casas  4 Alexander G Bick  55 Wei Zhou  56   57   58 Tianxi Cai #  12 Benjamin F Voight #  1   8   44   59 Kelly Cho #  6   14   18 J Michael Gaziano #  6   14   18 Ravi K Madduri #  7 Scott Damrauer #  1   44   46   60 Katherine P Liao #  4   6   12   61   62
Affiliations

Diversity and scale: Genetic architecture of 2068 traits in the VA Million Veteran Program

Anurag Verma et al. Science. .

Abstract

One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third (n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.

PubMed Disclaimer

Conflict of interest statement

Competing interests: C.J.O.’D. and J.P.C. are employed full time by the Novartis Institute of Biomedical Interest (their major contributions to this project occurred were while employed at VA Boston Healthcare System). H.K. is a member of advisory boards for Dicerna Pharmaceuticals, Sophrosyne Pharmaceuticals, Enthion Pharmaceuticals, and Clearmind Medicine; H.K. is a also consultant to Sobrera Pharmaceuticals and the recipient of research funding and medication supplies for an investigator-initiated study from an Alkermes member of the American Society of Clinical Psychopharmacology’s Alcohol Clinical Trials Initiative, which was supported in the last three years by Alkermes, Dicerna, Ethypharm, Lundbeck, Mitsubishi, Otsuka, and Pear Therapeutics. H.K. is the holder of US patent 10,900,082 titled: “Genotype-guided dosing of opioid agonists,” issued 26 January 2021. J.G. and R.P. were paid for their editorial work in the journal Complex Psychiatry. R.P. reports a research grant from Alkermes. S.D. reports grants from the following: Alnylam Pharmaceuticals, Inc; Astellas Pharma, Inc; AstraZeneca Pharmaceuticals LP; Biodesix; Celgene Corporation; Cerner Enviza; GlaxoSmithKline PLC, Janssen Pharmaceuticals, Inc.; Kantar Health; Myriad Genetic Laboratories, Inc.; NovartisInternational AG; and the Parexel International Corporation through the University of Utah or Western Institute for Veteran Research outside the submitted work. K.P.L. received a one-time consulting fee from UCB. S.M.D. received research support from RenalytixAI and Novo Nordisk, outside the scope of the current research, and is named as a coinventor on a Government-owned US Patent application related to the use of genetic risk prediction for venous thromboembolic disease and for the use of PDE3B inhibition for preventing cardiovascular disease, both filed by the US Department of Veterans Affairs in accordance with Federal regulatory requirements. All other authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Overview of the study population, genetic association results, and post-GWAS findings.
Top panel depicts the demographic characteristics of the study population; semicircles represent the mean values for age and body mass index. Bottom panel is organized into three sections: the left section summarizes the study data, the middle section provides key metrics of GWAS results such as the count of independent loci and lead SNPs, and the right section briefly outlines the post-GWAS findings.
Fig. 2.
Fig. 2.. Prevalence and sample sizes of 2078 traits.
The left plot illustrates the number of cases (y-axis) across binary trait categories (x-axis), and the right plot presents the sample (y-axis) across quantitative trait categories (x-axis). Population groups are represented by distinct colors and shapes. Larger shapes indicate conditions that are twice as prevalent when compared to EUR (see table S2)
Fig. 3.
Fig. 3.. Multipopulation genetic associations with 1092 binary traits.
Combined multitrait Manhattan plots and bar plots summarizing 8170 locus-trait associations for quantitative traits (P-value < 4.6 × 10−11). (A) Manhattan plot for binary traits displays associations across chromosomes (x-axis) and −log10P values (y-axis). Circles represent previously reported associations and triangles indicate previously unidentified trait associations. Triangle size corresponds to effect size, with upward triangles denoting risk associations and downward triangles signifying protective associations. On the top, gene names are highlighted to indicate previously reported variant trait associations (in black) and new trait associations (in pink). (B) Stacked bar plots for quantitative traits showcase the number of associations with locus-trait pairs across different trait categories. The left panel presents the count of known associations (green), previously unidentified trait associations (blue), and previously unidentified SNPs (light blue). Trait categories are ordered by the number of lead SNPs in descending order. The right panel is a dodged bar plot highlighting associations with lead SNPs based on their MAF categories: common variants (lower opacity) and low-frequency variants (higher opacity). The distribution of known associations (green) and previously unidentified SNPs (light blue) is shown for each trait category.
Fig. 4.
Fig. 4.. Multipopulation genetic associations with 178 quantitative traits.
Combined multitrait Manhattan plots and bar plots summarizing 17,879 locus-trait associations for quantitative traits (P-value < 4.6 × 10−11). (A) Manhattan plot for quantitative traits display associations across chromosomes (x-axis) and −log10P values (y-axis). Circles represent previously reported associations and triangles indicate previously unidentified trait associations. Triangle size corresponds to effect size, with upward triangles denoting risk associations and downward triangles signifying protective associations. On the top, gene names are highlighted to indicate previously reported variant trait associations (in black) and new trait associations (in pink). (B) Stacked bar plot for quantitative traits showcasing the number of associations with locus-trait pairs across different trait categories. The left panel presents the count of known associations (green), previously unidentified trait associations (blue), and previously unidentified SNPs (light blue). Trait categories are ordered by the number of lead SNPs in descending order. The right panel is a dodged bar plot highlighting associations with lead SNPs based on their MAF categories: common variants (lower opacity) and low-frequency variants (higher opacity). The distribution of known associations (green) and previously unidentified SNPs (light blue) is shown for each trait category.
Fig. 5.
Fig. 5.. Multipopulation fine-mapped signals.
(A) Upset plot of cross-population signal sharing for the 57,601 fine-mapped signals. Red portions of bars represent signals that had one or more variants showing suggestive association (P-value < 1×10−3) in an unmapped population and blue portions represent signals where the unmapped populations were underpowered in the unmapped ancestries to detect suggestive associations for any of the variants in the merged approximate credible set. Signal counts are displayed above the bars for intersections in which fewer than 1000 signals were identified. (B) Scatter plot of the number of signals detected per phenotype versus the meta analyzed sample size for the phenotype. Effective sample sizes were used for binary phenotypes and points are colored by the phenotype category. (C) The distribution of merged approximate credible set sizes for the fine-mapped signals. (D) Coding enrichment in precisely mapped signals. Bars are colored by the proportion of each represented by each grouped Variant Effect Predictor (VEP) annotation and the black boxes illustrate the proportion of each bar attributable to coding variation. P-values reflect the results of Fisher exact tests for coding annotation enrichment. (E) Distribution of effect sizes versus minor allele frequencies for high-confidence (PIP > 0.95) associations fine-mapped in quantitative (top) and binary (bottom) phenotypes. Each point represents a unique high-confidence variant-phenotype-population mapping. Point colors reflect the population in which they are mapped and their shapes reflect whether they are a phenotype association previously reported in the GWAS catalog (square), a phenotype association previously unidentified for a signal already reported in the catalog (triangle), or a signal and association that were both previously unidentified (circle). Inset bar plots reflect the proportions of high-confidence associations in these three categories across the four tested populations.
Fig. 6.
Fig. 6.. Putative causal gene and gene-level pleiotropy.
(A) Chromosome ideogram illustrating high-confidence cross-trait associations (PIP > 0.95) between genetic variants and independent traits. The ideogram highlights putative causal gene nominated using non-synonymous coding variation and Activity-by-Contact (ABC) promoters and enhancers to implicate genes for fine-mapped variants. (B) Histogram of the number of independent traits identified per gene. (C) GO-Figure plot showing clusters of biological process GO terms that are significantly predictive of the number of independent traits associated with each gene. (D) Scatter plot and Poisson regression of the number of independent traits per gene on the number of GO terms annotated per gene. (E) Scatter plot and Poisson regression of the number of independent traits per gene in the AFR and EUR groups (top) and the AMR and EUR groups (bottom). Genes with the greatest residuals from the regressions have been labeled.

Update of

  • Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program.
    Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho YL, Kim Y, Heise DA, Guare L, Panickan VA, Garcon H, Linares F, Costa L, Goethert I, Tipton R, Honerlaw J, Davies L, Whitbourne S, Cohen J, Posner DC, Sangar R, Murray M, Wang X, Dochtermann DR, Devineni P, Shi Y, Nandi TN, Assimes TL, Brunette CA, Carroll RJ, Clifford R, Duvall S, Gelernter J, Hung A, Iyengar SK, Joseph J, Kember R, Kranzler H, Levey D, Luoh SW, Merritt VC, Overstreet C, Deak JD, Grant SFA, Polimanti R, Roussos P, Sun YV, Venkatesh S, Voloudakis G, Justice A, Begoli E, Ramoni R, Tourassi G, Pyarajan S, Tsao PS, O'Donnell CJ, Muralidhar S, Moser J, Casas JP, Bick AG, Zhou W, Cai T, Voight BF, Cho K, Gaziano MJ, Madduri RK, Damrauer SM, Liao KP. Verma A, et al. medRxiv [Preprint]. 2023 Jun 29:2023.06.28.23291975. doi: 10.1101/2023.06.28.23291975. medRxiv. 2023. Update in: Science. 2024 Jul 19;385(6706):eadj1182. doi: 10.1126/science.adj1182. PMID: 37425708 Free PMC article. Updated. Preprint.

Comment in

References

    1. Mills MC, Rahal C, The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020). doi: 10.1038/s41588-020-0580-y; pmid: 32139905 - DOI - PubMed
    1. Martin AR et al. , Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). doi: 10.1038/s41588-019-0379-x; pmid: 30926966 - DOI - PMC - PubMed
    1. Chen Z et al. , China Kadoorie Biobank of 0.5 million people: Survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011). doi: 10.1093/ije/dyr120; pmid: 22158673 - DOI - PMC - PubMed
    1. Nagai A et al. , BioBank Japan Cooperative Hospital Group, Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27 (3S), S2–S8 (2017). doi: 10.1016/j.je.2016.12.005; pmid: 28189464 - DOI - PMC - PubMed
    1. Kolata G, “V.A. Recruits Millionth Veteran for Its Genetic Research Database” in The New York Times (2023). https://www.nytimes.com/2023/11/15/health/million-veterans-database-va.html.

Publication types