Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 2:2023.09.01.23294951.
doi: 10.1101/2023.09.01.23294951.

Prioritizing Cardiovascular Disease-Associated Variants Altering NKX2-5 Binding through an Integrative Computational Approach

Affiliations

Prioritizing Cardiovascular Disease-Associated Variants Altering NKX2-5 Binding through an Integrative Computational Approach

Edwin G Peña-Martínez et al. medRxiv. .

Update in

Abstract

Cardiovascular diseases (CVDs) are the leading cause of death worldwide and are heavily influenced by genetic factors. Genome-wide association studies (GWAS) have mapped > 90% of CVD-associated variants within the non-coding genome, which can alter the function of regulatory proteins, like transcription factors (TFs). However, due to the overwhelming number of GWAS single nucleotide polymorphisms (SNPs) (>500,000), prioritizing variants for in vitro analysis remains challenging. In this work, we implemented a computational approach that considers support vector machine (SVM)-based TF binding site classification and cardiac expression quantitative trait loci (eQTL) analysis to identify and prioritize potential CVD-causing SNPs. We identified 1,535 CVD-associated SNPs that occur within human heart footprints/enhancers and 9,309 variants in linkage disequilibrium (LD) with differential gene expression profiles in cardiac tissue. Using hiPSC-CM ChIP-seq data from NKX2-5 and TBX5, two cardiac TFs essential for proper heart development, we trained a large-scale gapped k-mer SVM (LS-GKM-SVM) predictive model that can identify binding sites altered by CVD-associated SNPs. The computational predictive model was tested by scoring human heart footprints and enhancers in vitro through electrophoretic mobility shift assay (EMSA). Three variants (rs59310144, rs6715570, and rs61872084) were prioritized for in vitro validation based on their eQTL in cardiac tissue and LS-GKM-SVM prediction to alter NKX2-5 DNA binding. All three variants altered NKX2-5 DNA binding. In summary, we present a bioinformatic approach that considers tissue-specific eQTL analysis and SVM-based TF binding site classification to prioritize CVD-associated variants for in vitro experimental analysis.

Keywords: cardiovascular diseases; gene regulation; non-coding variants; support vector machine; transcription factors.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Identification of functional CVD-associated SNPs. A) Pipeline to identify potential CVD-causing SNPs. B) Number of CVD-associated SNPs per chromosome. C) Distribution of SNP frequency within autosomal chromosome, binned by 1Mb windows. D) SNP-Gene pairs with differential gene expression in cardiac tissue. Each dot represents a SNP-Gene pair that is differentially expressed in heart atrial appendage or left ventricle in one or more populations. rs6715570-BARD1, rs61872084-METTL10 and rs59310144-RNASEH2B are SNP-Gene pairs that were evaluated in vitro in Figure 3.
Figure 2:
Figure 2:
Training and testing of LS-GKM SVM predictive model. A) Schematic of model training with NKX2–5 and TBX5 ChIP-seq data from HiPSC-CM. B) Scoring of ~520,000 DGF that occur in heart enhancers with the NKX2–5 (top) and TBX5 (bottom) predictive models. C) In vitro testing of predictive model for highest, middle, and lowest scored sequences for NKX2–5 (top) and TBX5 (bottom). For NKX2–5, we tested chr22:25120040–25120058 (circle with blue line), chr3:8596782–8596800 (triangle with green line), and chr7:101950814–101950832 (square with red line). For TBX5, we tested chr2:30359836–30359854 (circle with blue lines), chr1:57623182–57623200 (triangle with green line), and chr4:119047319–119047337 (square with red line).
Figure 3:
Figure 3:
CVD-associated SNPs alter NKX2–5 in vitro binding. A) DelstaSVM score distribution of the 9,309 CVD-associated SNPs. B) Representative EMSA gel for rs59310144 reference (Ref) and alternate (Alt) alleles. C) Binding curves for reference (Ref) and variant (Alt) alleles of rs59310144 (top), rs6715570 (middle), and rs61872084 (bottom). Experiments were performed in triplicates and binding curves show average bound fractio (X) and error bars are standard error. D) Cardiac tissue eQTL analysis of RNASEH2B (top), BARD1 (middle), and METTL10 (bottom) expressed in heart atrial appendage or left ventricle when rs59310144, rs6715570, and rs61872084 occur, respectively.

References

    1. Kathiresan S. & Srivastava D. Genetics of human cardiovascular disease. Cell vol. 148 1242–1257 (2012). - PMC - PubMed
    1. Ma L. Y. et al. China cardiovascular diseases report 2018: An updated summary. Journal of Geriatric Cardiology vol. 17 1–8 (2020). - PMC - PubMed
    1. Mensah G. A., Roth G. A. & Fuster V. The Global Burden of Cardiovascular Diseases and Risk Factors: 2020 and Beyond. Journal of the American College of Cardiology vol. 74 2529–2532 (2019). - PubMed
    1. Yuyun M. F., Sliwa K., Kengne A. P., Mocumbi A. O. & Bukhman G. Cardiovascular diseases in sub-saharan Africa compared to high-income countries: An epidemiological perspective. Glob Heart 15, (2020). - PMC - PubMed
    1. Townsend N. et al. Epidemiology of cardiovascular disease in Europe. Nature Reviews Cardiology vol. 19 133–143 (2022). - PubMed

Publication types