Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 5:2024.11.09.24316996.
doi: 10.1101/2024.11.09.24316996.

DiscoDivas: Leveraging genetic ancestry continuum information to interpolate PRS for admixed populations

Affiliations

DiscoDivas: Leveraging genetic ancestry continuum information to interpolate PRS for admixed populations

Yunfeng Ruan et al. medRxiv. .

Abstract

The relatively low representation of admixed populations in both discovery and fine-tuning individual-level datasets limits polygenic risk score (PRS) development and equitable clinical translation for admixed populations. Under the assumption that the most informative PRS model for a genetically homogeneous sample varies linearly in an ancestry continuum space, we introduce a Genetic Distance-assisted PRS Combination Pipeline for Diverse Genetic Ancestries (DiscoDivas) to interpolate a harmonized PRS for diverse, especially admixed, genetic ancestries, leveraging multiple PRS models fine-tuned within existing samples, which are mostly of single ancestry, and genetic distance. DiscoDivas treats genetic ancestry as a continuous variable and does not require shifting between different models when calculating PRS for different ancestries. We generated PRS with DiscoDivas and the current conventional method, i.e. fine-tuning multiple GWAS PRS using the matched or similar genetic ancestry samples. DiscoDivas generated a harmonized PRS of the accuracy comparable to or higher than the conventional approach, with the greatest advantage exhibited in admixed individuals.

Keywords: PRS harmonization; admixed population; genetic ancestry continuum; genetic distance; polygenic risk score (PRS); principal component analysis (PCA).

PubMed Disclaimer

Conflict of interest statement

Declaration of interests P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Bristol Myers Squibb, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Capital, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, Novo Nordisk, TenSixteen Bio, and Tourmaline Bio, equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. All other authors report no conflicts.

Figures

Figure 1:
Figure 1:. The workflow of comparing DiscoDivas with the existing method.
Left: The ideal situation for the existing method is to fine-tune a PRS model that contains multiple GWAS with matched fine-tuning data, which is not currently available for many under-represented populations. Right: DiscoDivas first fine-tunes the PRS in the available ancestries, which are currently AFR, EAS, EUR, and SAS, and interpolates PRS for diverse ancestry groups based on these fine-tuned PRS. In this plot, POP refers to any ancestry for which the PRS is to be calculated. The bottom 2 panels showed the PC distribution of fine-tuning datasets (green circles) and the testing data (orange circles). While more PCs were used in the analysis, only PC1 and PC2 are shown here for clarity.
Figure 2
Figure 2. Relative R2 increase of DiscoDivas over the conventional PRS fine-tuned in a matched sample when tested in admixed individuals.
The x-axis shows the simulated number of causal SNPs. The horizontal bar shows the mean relative R2 increase and the color of the horizontal bar indicates the p-value of the paired t-test of DiscoDivas PRS R2 and conventional PRS R2, with cyan being p-value<0.0005, dark blue being p-value<0.05 and grey being p-value>0.05. In panels a, b, and c, the causal SNP effect sizes are constant across different populations. The annotation texts on the top of each panel shows the sample size of discovery GWAS of different populations and the distribution of causal SNP effect sizes.
Figure 3
Figure 3. Relative R2 increase of DiscoDivas over the conventional PRS fine-tuned in a matched sample.
The x-axis shows the population in which the PRS was tested. We used OTH as the fine-tuning dataset for the test in both OTH and AMR due to the absence of matched AMR training data. The horizontal bar shows the mean of relative increase, and the line-type of the bar indicates the p-value of paired t-test of DiscoDivas PRS R2 and conventional PRS R2, with the solid bar being p-value <0.05 and dotted bar being p-value>0.05.
Figure 4
Figure 4. PRS performance for coronary artery disease (CAD) and type 2 diabetes (DM2) tested in UKBB and MGBB.
The plot shows BETA, which is defined as ln(OR per SD) with the error bar showing 95% CI. The sub-panels show that population of the testing sample and the different colors show the method for generating the PRS, either fine-tuning in a single sample or combining the PRS using DiscoDivas.

References

    1. Martin A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591 (2019). - PMC - PubMed
    1. Miao J. et al. Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics. Nat Commun 14, 832 (2023). - PMC - PubMed
    1. Ruan Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet 54, 573–580 (2022). - PMC - PubMed
    1. Jin J. et al. MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups. bioRxiv 2023.04.12.536510 (2023) doi: 10.1101/2023.04.12.536510. - DOI
    1. Patel A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat Med 29, 1793–1803 (2023). - PMC - PubMed

Publication types

LinkOut - more resources