Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals

Margaret Sunitha Selvaraj^{1

2

3}, Xihao Li^{4

5}, Zilin Li⁶, Eric Van Buren⁶, Sara Haidermota^{1

2}, Darina Postupaka^{1

2}, Whitney Hornsby^{1

2}, Joshua C Bis⁷, Jennifer A Brody⁷, Brian E Cade^{8

9}, Ren-Hua Chung¹⁰, Joanne E Curran¹¹, Scott M Damrauer^{12

13

14}, Lisa de Las Fuentes¹⁵, Paul S de Vries¹⁶, Ravindranath Duggirala¹⁷, Barry I Freedman¹⁸, MariaElisa Graff¹⁹, Xiuqing Guo²⁰, Bertha A Hidalgo²¹, Lifang Hou²², Ryan Irvin²³, Renae Judy¹², Rita R Kalyani²⁴, Tanika N Kelly²⁵, Iain R Konigsberg²⁶, Brian G Kral²⁴, Lydia Coulter Kwee²⁷, Daniel Levy^{28

29}, Changwei Li³⁰, Ani W Manichaikul³¹, Lisa Warsinger Martin³², May E Montasser³³, Alanna C Morrison¹⁶, Take Naseri^{34

35}, Kari E North³⁶, Jeffrey R O'Connell³³, Nicholette D Palmer³⁷, Patricia A Peyser³⁸, Alex P Reiner³⁹, Svati H Shah²⁷, Roelof A J Smit^{40

41}, Jennifer A Smith^{38

42}, Kent D Taylor²⁰, Hemant Tiwari⁴³, Michael Y Tsai⁴⁴, Satupa'itea Viali^{45

46

47}, Zhe Wang^{23

40}, Yuxuan Wang⁴⁸, Wei Zhao^{38

42}, Donna K Arnett⁴⁹, John Blangero¹¹, Eric Boerwinkle¹⁶, Donald W Bowden³⁷, Jenna C Carlson⁵⁰, Yii-Der Ida Chen²⁰, Patrick T Ellinor², Myriam Fornage⁵¹, Jiang He³⁰, Nancy Heard-Costa^{28

52}, Robert C Kaplan¹⁹, Sharon L R Kardia³⁸, Charles Kooperberg³⁹, William E Kraus²⁷, Leslie A Lange²⁶, Ruth J F Loos^{40

41}, Braxton D Mitchell^{33

53}, Bruce M Psaty^{7

54

55}, Daniel J Rader^{14

56

57}, Susan Redline^{8

9}, Stephen S Rich³¹, Lisa R Yanek²⁴, Richard Gibbs⁵⁸, Stacey Gabriel⁵⁹, Karine A Viaud-Martinez⁶⁰, Susan K Dutcher⁶¹, Soren Germer⁶², Ryan Kim⁶³, Jerome I Rotter²⁰, Xihong Lin⁶, Gina M Peloso^#⁶⁴; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Pradeep Natarajan^#^{65

66

67}

Collaborators, Affiliations

PMID: 40926209
PMCID: PMC12418676
DOI: 10.1186/s13059-025-03698-0

Meta-Analysis

Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals

Margaret Sunitha Selvaraj et al. Genome Biol. 2025.

. 2025 Sep 9;26(1):273.

doi: 10.1186/s13059-025-03698-0.

PMID: 40926209
PMCID: PMC12418676
DOI: 10.1186/s13059-025-03698-0

Abstract

Background: Rare genetic variation provided by whole genome sequence datasets has been relatively less explored for its contributions to human traits. Meta-analysis of sequencing data offers advantages by integrating larger sample sizes from diverse cohorts, thereby increasing the likelihood of discovering novel insights into complex traits. Furthermore, emerging methods in genome-wide rare variant association testing further improve power and interpretability.

Results: Here, we conduct the largest meta-analysis of whole genome sequencing for low-density lipoprotein cholesterol (LDL-C), a therapeutic target for coronary artery disease, analyzing data from 246 K participants and integrating 1.23B variants from the UK Biobank and the Trans-Omics for Precision Medicine (TOPMed) program. We identify numerous rare coding and non-coding gene associations related to LDL-C, with replication across 86 K participants in All of Us. Our findings are based on single-variant analyses, rare coding and non-coding variant aggregation tests, and sliding window approaches. Through this comprehensive analysis, we identify 704 novel single-variant associations, 25 novel rare coding variant aggregates, 28 novel rare non-coding variant aggregates, and one novel sliding window aggregate.

Conclusions: This study provides a meta-analysis framework for large-scale whole genome sequence association analyses from diverse population groups, yielding novel rare non-coding variant associations.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: For the TOPMed cohort, study participants provided consent per each study’s Institutional Review Board (IRB)-approved protocol. The TOPMed data analysis is associated with paper proposal ID 15536. For UKB participants, written informed consent was given per the UKB primary protocol. UK Biobank data analysis was facilitated through UKB application 7089. For the AOU cohort, written informed consent was provided in accordance with the primary Institutional Review Board for AOU. AOU data analysis was facilitated through the AOU Researcher Workbench. Secondary use of UK Biobank, TOPMed, and AOU data was approved by the Massachusetts Hospital Institutional Review Board. Consent for publication: All participants provided informed consent for publication. Competing interests: P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Cleerly, Genentech / Roche, Ionis, Novartis, and Silence Therapeutics, personal fees from AIRNA, Allelica, Apple, AstraZeneca, Bain Capital, Blackstone Life Sciences, Bristol Myers Squibb, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Capital, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, Novo Nordisk, TenSixteen Bio, and Tourmaline Bio, equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli, and TenSixteen Bio, royalties from Recora for intensive cardiac rehabilitation, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. The remaining authors declare no competing interests.

Figures

**Fig. 1**
Overall results from TOPMed and UKB WGS meta-analysis. A Procedure used to identify novel variants from individual variant GWAS: A total of 21,657 variants were genome significant in the individual variant meta-analysis with a P-value ≤ 5 × 10⁻⁰⁹. These variants were used to define genomic risk locus using FUMA with 1000 Genome phase 3 as reference panel, where independent SNPs (r2 ≥ 0.6) were identified and if LD blocks of independent significant SNPs are closely located to each other (< 250 kb based on the most right and left SNPs from each LD block), they are merged into one genomic locus. Finally, we identified 128 genomic loci from our individual variant GWAS. Comparison of these genomic risk loci with the GLGC summary statistics yielded one locus with a genome-significant variant associated to LDL-C. From the totality of the genome, significant variants identified in the present study, 704 were unique to WGS data and not found in the GLGC summary statistics. B Procedure used to identify novel aggregates from rare variant test: From a total of 20 K genes, rare variant aggregates were assessed for coding (5-masks) and non-coding (7-masks) and 2.6 M regions based on the sliding-window approach. Bonferroni-corrected p-values were used to identify genome-significant rare variant aggregates before and after conditional analysis. The set of aggregates that passed the conditional analysis were replicated in an independent cohort (i.e., AoU). The number of rare variant aggregates passing each step is shown. WGS—Whole Genome Sequencing; GLGC—Global Lipids Genetics Consortium; SNP—Single-Nucleotide Polymorphism; GWAS—Genome-Wide Association Studies; LD—Linkage Disequilibrium; LDL-C—Low-Density Lipoprotein Cholesterol

**Fig. 2**
MetaSTAAR-O p-value comparison. A MetaSTAAR-O p-value comparison between before and after conditional analysis for gene-centric coding aggregates. B MetaSTAAR-O p-value comparison between before and after conditional analysis for gene-centric non-coding aggregates. In both the plots, the aggregates are ordered based on conditional MetaSTAAR p-value. C Comparison of MetaSTAAR-O p-value for genes that have at least one coding and non-coding signal before conditional analysis. Each dot represents a gene, and the color of the dots represent if the gene is genome significant in either coding or non-coding or both. Most significant gene names each of those categories are mapped

**Fig. 3**
*ABCA6* protein structure with variants identified using the aPC scores. A Protein structure of *ABCA6* consists of 2 ABC transporter and transmembrane domains. B Structure of *ABCA6* with the 39 SNVs identified from *ABCA6*-pLoF-ds aggregate set (red) and 30 SNVs identified from *ABCA6*-missense aggregate set (orange) and the common SNVs mapping to both aggregate sets (pink). C Structure of *ABCA6* with the highly scored variants (red) and structural proximity of the stretch of amino acids in ABC transporter 1 domain identified through this analysis. ABC—ATP-binding cassette; SNV—Single- Nucleotide Variations; pLoF-ds—putative loss-of-function and disruptive missense variants

See this image and copyright information in PMC

References

1. Tsao CW, Aday AW, Almarzooq ZI, Alonso A, Beaton AZ, Bittencourt MS, et al. Heart disease and stroke statistics—2022 update: A report from the American Heart Association. Circulation 2022 Feb 22 [cited 2024 Mar 13];145(8). Available from: https://pubmed.ncbi.nlm.nih.gov/35078371/. - PubMed
1. Goldstein JL, Brown MS. The LDL receptor. Arterioscler Thromb Vasc Biol. 2009Apr;29(4):431–8. - DOI - PMC - PubMed
1. Khera AV, Chaffin M, Zekavat SM, Collins RL, Roselli C, Natarajan P, et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction. Circulation. 2019Mar 26;139(13):1593–602. - DOI - PMC - PubMed
1. Abifadel M, Boileau C. Genetic and molecular architecture of familial hypercholesterolemia. J Intern Med. 2023Feb;293(2):144–65. - DOI - PMC - PubMed
1. Cohen JC, Boerwinkle E, Mosley TH Jr, Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med. 2006Mar 23;354(12):1264–72. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals

Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical