This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Aug 27:2025.08.26.671106.

doi: 10.1101/2025.08.26.671106.

An Efficient Lasso Framework for Admixture-Aware Polygenic Scores

Franklin Ockerman¹, Brian Chen¹, Quan Sun^{2

3}, Elena V Kharitonova¹, Walter Chen¹, Laura Y Zhou⁴, Ruth J F Loos⁵, Charles Kooperberg⁶, Ulrike Peters⁶, Jeffrey Haessler⁶, Alexander Reiner⁶, Su Yon Jung^{7

8}, JoAnn E Manson^{9

10}, Rami Nassir¹¹, Kari E North¹², Steven Buyske¹³, Christopher A Haiman¹⁴, David V Conti¹⁵, Lynne R Wilkens¹⁶, Ethan M Lange¹⁷, Nancy J Cox¹⁸, Hongyuan Cao¹⁹, Laura M Raffield²⁰, Yun Li^{1

20

21}, Ran Tao^{22

23}

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
² Center for Computation and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
³ Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA.
⁴ Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
⁵ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Fred Hutchinson Cancer Center, Division of Public Health Sciences, Seattle, WA, USA.
⁷ Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, CA, USA.
⁸ Translational Sciences Section, School of Nursing, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
⁹ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.
¹⁰ Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
¹¹ Department of Pathology, School of Medicine, Umm Al-Qura University, Mecca, Saudi Arabia.
¹² Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
¹³ Department of Statistics, Rutgers University, Piscataway, NJ, USA.
¹⁴ Department of Population and Public Health Sciences, Keck School of Medicine of USC, Los Angeles, CA, USA.
¹⁵ Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
¹⁶ Department of Epidemiology, University of Hawaii Cancer Center, Honolulu, Hawaii.
¹⁷ Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
¹⁸ Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
¹⁹ Department of Statistics, Florida State University, Tallahassee, FL, USA.
²⁰ Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
²¹ Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
²² Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
²³ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.

PMID: 40909540
PMCID: PMC12407912
DOI: 10.1101/2025.08.26.671106

An Efficient Lasso Framework for Admixture-Aware Polygenic Scores

Franklin Ockerman et al. bioRxiv. 2025.

[Preprint]. 2025 Aug 27:2025.08.26.671106.

doi: 10.1101/2025.08.26.671106.

Authors

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
² Center for Computation and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
³ Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA.
⁴ Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
⁵ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Fred Hutchinson Cancer Center, Division of Public Health Sciences, Seattle, WA, USA.
⁷ Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, CA, USA.
⁸ Translational Sciences Section, School of Nursing, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
⁹ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.
¹⁰ Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
¹¹ Department of Pathology, School of Medicine, Umm Al-Qura University, Mecca, Saudi Arabia.
¹² Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
¹³ Department of Statistics, Rutgers University, Piscataway, NJ, USA.
¹⁴ Department of Population and Public Health Sciences, Keck School of Medicine of USC, Los Angeles, CA, USA.
¹⁵ Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
¹⁶ Department of Epidemiology, University of Hawaii Cancer Center, Honolulu, Hawaii.
¹⁷ Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
¹⁸ Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
¹⁹ Department of Statistics, Florida State University, Tallahassee, FL, USA.
²⁰ Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
²¹ Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
²² Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
²³ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.

PMID: 40909540
PMCID: PMC12407912
DOI: 10.1101/2025.08.26.671106

Abstract

Polygenic scores (PGS) have promising clinical applications for risk stratification, disease screening, and personalized medicine. However, most PGS are trained on predominantly European ancestry cohorts and have limited portability to external populations. While cross-population PGS methods have demonstrated greater generalizability than single-ancestry PGS, they fail to properly account for individuals with recent admixture between continental ancestry groups. GAUDI is a recently proposed PGS method which overcomes this gap by leveraging local ancestry to estimate ancestry-specific effects, penalizing but allowing ancestry-differential effects. However, the modified fused LASSO approach used by GAUDI is computationally expensive and does not readily accommodate more than two-way admixture. To address these limitations, we introduce HAUDI, an efficient LASSO framework for admixed PGS construction. HAUDI re-parameterizes the GAUDI model as a standard LASSO problem, allowing for extension to multi-way admixture settings and far superior computational speed than GAUDI. In extensive simulations, HAUDI compares favorably to GAUDI while dramatically reducing computation time. In real data applications, HAUDI uniformly out-performs GAUDI across 18 clinical phenotypes, including total triglycerides (TG), C-reactive protein (CRP), and mean corpuscular hemoglobin concentration (MCHC), and shows substantial benefits over an ancestry-agnostic PGS for white blood cell count (WBC) and chronic kidney disease (CKD).

PubMed Disclaimer

Figures

**Fig 1:. Schematic overview of a HAUDI.**
In step 0, we define the phenotype model for an individual under the HAUDI framework. HAUDI requires individual-level phased genotype data (a). We assume an individual haplotype is a composition of segments inherited from two or more source populations, and in b), we define an indicator for local ancestry at each variant and haplotype. An individual’s genetic contribution to their phenotype c) is the sum of ancestry-specific effects across the genome. In step 1, we estimate local ancestry in a target dataset, using a reference panel of haplotypes from each source population. In step 2, we minimize HAUDI’s objective function and obtain a set of ancestry-specific effect estimates. We obtain a final PGS by applying these effects to the phenotype model defined in step 0.

**Fig 2:. Comparison of PGS methods under various simulation settings.**
Box-plots correspond to R² value across 10 test sets. Simulation settings include genetic correlation ( $ρ_{g}$ ) between CEU/YRI-specific effects, heritability, and number of true causal variants. All models were fit on 1000 total variants, including causal variants.

**Fig 3:. Comparisons between estimated and true CEU-specific effects.**
Plotted in each panel are the estimated (x axis) and true (y axis) variant effects, stratified by genetic correlation and PGS method. Only the first repetition is plotted for illustration, however the correlation between the true and estimated effects across all simulations (r) is annotated for each panel. This figure is restricted to CEU-specific effects. For LASSO, the x-axis shows estimated ancestry-agnostic effects.

**Fig 4:. Comparison of test set R² on PAGE African American data.**
Penalized regression models (GAUDI, HAUDI, LASSO) were fit using the top 500 pruned variants by p-value in UKB GWAS data. SDPR_admix was fit using the top 50,000 pruned variants. Phenotypes are restricted to those with N > 5000 and mean R² > 0.01 in at least one model. Per-phenotype boxplots correspond to the R² values in each test set fold.

**Fig 5:. Comparison of PGS run time on data from PAGE African American participants.**
We report run times across phenotypes and testing folds for the three admixed PGS models. To compare SDPR_admix and HAUDI (a), we restrict to models fit on 50,000 variants. To compare GAUDI and HAUDI, we restrict to models with 500 variants.

**Fig 6:. Comparison of GAUDI and LASSO test set R² on PAGE Hispanic/Latino data.**
All models were fit using the top 500, 10000, or 50000 variants (by p-value in UKB GWAS data). Phenotypes are restricted to those with N > 5000 and mean R² > 0.01 in at least one model. Per-phenotype boxplots correspond to the R² values in each test set fold.

**Fig 7:. Comparison of HAUDI ancestry-specific effect estimates (African-American participants).**
Effect estimates were obtained using the optimal set of tuning parameters (selected by cross-validation) in the first training fold. HDL (b) was chosen as a comparator for WBC (a) to demonstrate that HAUDI can also capture genetic architectures with similar effects across ancestries. Correlation between ancestry-specific effects (r) is given in each panel.

**Fig 8:. Comparison of HAUDI ancestry-specific effect estimates (Hispanic/Latino samples) for White Blood Cell Count (WBC).**
Effect estimates were obtained using the optimal set of tuning parameters (selected by cross-validation) in the first training fold. HDL (b) was chosen as a comparator for WBC (a) to demonstrate that HAUDI can also capture genetic architectures with similar effects across ancestries. Correlation between ancestry-specific effects (r) is given in each panel.

See this image and copyright information in PMC

References

1. Wray N. R. et al. From Basic Science to Clinical Application of Polygenic Risk Scores: A Primer. JAMA Psychiatry 78, 101–109 (2021). - PubMed
1. Lu X. et al. A polygenic risk score improves risk stratification of coronary artery disease: a large-scale prospective Chinese cohort study. Eur. Heart J. 43, 1702–1711 (2022). - PMC - PubMed
1. Seibert T. M. et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ 360, j5757 (2018). - PMC - PubMed
1. Khera A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018). - PMC - PubMed
1. Torkamani A., Wineinger N. E. & Topol E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

An Efficient Lasso Framework for Admixture-Aware Polygenic Scores

Affiliations

An Efficient Lasso Framework for Admixture-Aware Polygenic Scores

Authors

Affiliations

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous