Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul;56(7):1412-1419.
doi: 10.1038/s41588-024-01791-x. Epub 2024 Jun 11.

Exome sequence analysis identifies rare coding variants associated with a machine learning-based marker for coronary artery disease

Affiliations

Exome sequence analysis identifies rare coding variants associated with a machine learning-based marker for coronary artery disease

Ben Omega Petrazzini et al. Nat Genet. 2024 Jul.

Abstract

Coronary artery disease (CAD) exists on a spectrum of disease represented by a combination of risk factors and pathogenic processes. An in silico score for CAD built using machine learning and clinical data in electronic health records captures disease progression, severity and underdiagnosis on this spectrum and could enhance genetic discovery efforts for CAD. Here we tested associations of rare and ultrarare coding variants with the in silico score for CAD in the UK Biobank, All of Us Research Program and BioMe Biobank. We identified associations in 17 genes; of these, 14 show at least moderate levels of prior genetic, biological and/or clinical support for CAD. We also observed an excess of ultrarare coding variants in 321 aggregated CAD genes, suggesting more ultrarare variant associations await discovery. These results expand our understanding of the genetic etiology of CAD and illustrate how digital markers can enhance genetic association investigations for complex diseases.

PubMed Disclaimer

Conflict of interest statement

R.D reports being a scientific co-founder, consultant and equity holder for Pensieve Health (pending), and being a consultant for Variant Bio, all not related to this study. R.S.R reports research funding to his institution from Amgen, Arrowhead, Eli Lilly, Merck, NIH, Novartis, Novo Nordisk, Regeneron, and 89Bio, consulting fees from Amgen, Avilar, CRISPER Therapeutics, Editas, Eli Lilly, Lipigon, New Amsterdam, Novartis, Precision Biosciences, Regeneron, UltraGenyx, Verve Therapeutics, non-promotional honoraria from Meda Pharma, royalties from Wolters Kluwer (UpToDate), and stock holding in MediMergent, LLC. He reports patent applications on: Methods and systems for biocellular marker detection and diagnosis using a microfluidic profiling device. EFS ID: 32278349. Application No. (PCT/US2019/026364) (provisional); all unrelated to this study. The remaining authors declare no competing interests.

Figures

Extended Data Fig. 1.
Extended Data Fig. 1.. Area under the receiver operating characteristic curves on the testing sets used to evaluate in silico score for coronary artery disease (ISCAD).
We trained and tested 100 models with independent random sampling. Receiver operator characteristic curves are shown for the current ISCAD model trained on the UK Biobank (a), the All of Us biobank (b), the BioMe biobank (c). AUC: Area under the receiver operating characteristic curve.
Extended Data Fig. 2.
Extended Data Fig. 2.. Distribution of the in silico score for coronary artery disease (ISCAD) in cases and controls.
We trained and tested 100 models with independent random sampling. Distributions of CAD cases and controls separately are shown for the current ISCAD model trained on the UK Biobank (a), the All of Us biobank (b) and the BioMe biobank (c). Vertical dotted lines represent the median value of the distribution. ISCAD: in silico score for coronary artery disease.
Extended Data Fig. 3.
Extended Data Fig. 3.. Manhattan plot of rare coding variant association meta-analysis.
We tested 2,738,849 rare missense and protein truncating variants from 604,915 individuals in the UK Biobank, the All of Us Research Program and the BioMe Biobank. Dotted horizontal line represents an exome-wide significance threshold of P = 4.3 × 10−7. We obtained two-sided base 10 logarithm P-values from a fixed-effect inverse-variance weighted meta-analysis. Italicized text indicates gene names.
Fig. 1.
Fig. 1.
Schematic of the study design. The schematic outlines the workflow for a rare variant association study using coronary artery disease (CAD) cases and controls (left) and a rare variant association study using an in-silico score for CAD (ISCAD) (right). The workflow for a rare variant association study (left) for CAD uses diagnosis and procedural codes to phenotype CAD cases and controls. Then, a rare variant association analysis using the exome sequencing data is performed on CAD status. Similarly, a rare variant association study on ISCAD (right) uses diagnosis and procedural codes to define CAD cases and controls. Machine learning models are then fitted to the CAD status label using diagnostic codes, laboratory tests, vital signs, and medications from the electronic health records of these individuals. These models are used to compute ISCAD in all individuals. Then, a rare variant association analysis using the exome sequencing data is performed on ISCAD.
Fig. 2.
Fig. 2.
Evidence supporting the role of 17 genes associated with an in-silico score for coronary artery disease (ISCAD) in coronary artery disease (CAD) biology. We used 9 independent axes of genetic evidence to assess the role of 17 ISCAD genes in CAD biology. Tier-1 groups genes with strong evidence supporting their role in CAD biology; these have clinical trial/s indicating a drug target effect on CAD, previously known rare coding variant associations with CAD and/or monogenic CAD. Tier-2 groups genes with moderately strong evidence; these are either mapped to one of the 321 CAD genes or have coding variants and/or eQTL signal (P<10−6) in the largest CAD GWAS to date. Tier-3 groups genes with moderate evidence defined by rare coding variant associations with causal CAD risk factors (namely, low-density lipoprotein cholesterol, triglycerides, lipoprotein a, body mass index, type 2 diabetes and hypertension) or nominal association with CAD-related clinical outcomes (namely, myocardial infarction, arrythmia, heart failure, arterial stiffness index and left ventricular ejection fraction). Tier-4 groups genes with additional evidence characterized by genome-wide significant associations with a causal CAD risk factor (P<5×10−8) collected from the Open Targets Genetics platform and/or nominal significance in more than one cohort in our analysis. For a detailed description on the molecular function of each gene please refer to the Supplementary Note. We obtained two-sided P-values from a linear regression for associations with continuous outcomes, and a logistic regression for associations with categorical outcomes. Italicized text indicates gene names. ISCAD: in-silico score for coronary artery disease; CAD: coronary artery disease; eQTL: expression quantitative trait loci; GWAS: genome-wide association study.

References

    1. Roth Gregory A et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019. Journal of the American College of Cardiology 76, 2982–3021 (2020). - PMC - PubMed
    1. Khera AV & Kathiresan S Genetics of coronary artery disease: discovery, biology and clinical translation. Nat Rev Genet 18, 331–344 (2017). - PMC - PubMed
    1. Chen Z & Schunkert H Genetics of coronary artery disease in the post-GWAS era. Journal of Internal Medicine 290, 980–992 (2021). - PubMed
    1. Aragam KG et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nature Genetics 54, 1803–1815 (2022). - PMC - PubMed
    1. Tcheandjieu C et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nature Medicine 28, 1679–1692 (2022). - PMC - PubMed

Methods-only references

    1. Bycroft C et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). - PMC - PubMed
    1. Klarin D et al. Genetic analysis in UK Biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nature Genetics 49, 1392–1397 (2017). - PMC - PubMed
    1. Honigberg MC et al. Premature Menopause, Clonal Hematopoiesis, and Coronary Artery Disease in Postmenopausal Women. Circulation 143, 410–423 (2021). - PMC - PubMed
    1. Khera AV et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics 50, 1219–1224 (2018). - PMC - PubMed
    1. Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4(2015). - PMC - PubMed