Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Mar 20:2024.03.19.24304547.
doi: 10.1101/2024.03.19.24304547.

Utilizing multimodal AI to improve genetic analyses of cardiovascular traits

Affiliations

Utilizing multimodal AI to improve genetic analyses of cardiovascular traits

Yuchen Zhou et al. medRxiv. .

Abstract

Electronic health records, biobanks, and wearable biosensors contain multiple high-dimensional clinical data (HDCD) modalities (e.g., ECG, Photoplethysmography (PPG), and MRI) for each individual. Access to multimodal HDCD provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological system (e.g., circulatory system) encode complementary and overlapping information. We propose a novel multimodal deep learning method, M-REGLE, for discovering genetic associations from a joint representation of multiple complementary HDCD modalities. We showcase the effectiveness of this model by applying it to several cardiovascular modalities. M-REGLE jointly learns a lower representation (i.e., latent factors) of multimodal HDCD using a convolutional variational autoencoder, performs genome wide association studies (GWAS) on each latent factor, then combines the results to study the genetics of the underlying system. To validate the advantages of M-REGLE and multimodal learning, we apply it to common cardiovascular modalities (PPG and ECG), and compare its results to unimodal learning methods in which representations are learned from each data modality separately, but the downstream genetic analyses are performed on the combined unimodal representations. M-REGLE identifies 19.3% more loci on the 12-lead ECG dataset, 13.0% more loci on the ECG lead I + PPG dataset, and its genetic risk score significantly outperforms the unimodal risk score at predicting cardiac phenotypes, such as atrial fibrillation (Afib), in multiple biobanks.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Overview of multimodal representation learning for genetic discovery on low-dimensional embeddings (M-REGLE).
a) M-REGLE steps b) U-REGLE steps. Step 1 in M-REGLE (a) obtains the raw embeddings from multimodal HDCD in a joint fit while step 1 in U-REGLE (b) obtains the raw embeddings for each modality separately. In step 2, to ensure completely uncorrelated embeddings, we applied PCA on the raw embeddings. Lastly, we ran GWAS on the uncorrelated embeddings and combined them.
Figure 2:
Figure 2:. M-REGLE embeddings improve cardiovascular trait prediction.
a) Validation reconstruction losses in log-scale of U-REGLE and M-REGLE on 12-lead ECG data. X-axis is the numbers of latent dimensions (1, 2, 4, 8, 16) per ECG lead. Standard errors (SE) are too small to plot (See Supplementary Table 4 for the SE). b) Validation reconstruction losses in log-scale of M-REGLE and U-REGLE across numbers of latent dimensions on Lead I ECG and PPG data. X-axis is the numbers of latent dimensions where the the first number is the latent dimension of ECG lead I and second number is latent dimension for PPG: 3+1, 5+2, 8+4, 11+5, 13+6 (See Supplementary Table 7 for SE of M-REGLE and U-REGLE reconstruction losses). All the difference between M-REGLE and U-REGLE in panels a and b are significant. c) AUCROC prediction of 9 phenotypes utilizing ElasticNet trained on the 12 embeddings obtained from ECG lead I and PPG, and d) AUPRC prediction of 9 phenotypes utilizing ElasticNet trained on the 12 embeddings obtained from ECG lead I and PPG. Star (*) sign indicates a statistically significant difference between the two methods using paired bootstrapping (100 repetitions) with 95% confidence.
Figure 3:
Figure 3:. M-REGLE on 12 ECG leads increases genomic discovery.
a) Manhattan plot depicting M-REGLE GWAS p-values for all 22 autosomal chromosomes. Black gene names indicate the closest gene for each locus with −log10p > 20 and red dots denote all other GWS loci. Blue gene names and dots indicate loci also identified in U-REGLE. b) Comparison of M-REGLE GWS variants-in-hits with U-REGLE. The X-axis is the −log p-value of U-REGLE. The Y-axis is the −log p-value of the M-REGLE. All p-values in (a) and (b) are computed by summing the chi-square statistics for all 96 embeddings to perform a single joint chi-square test. The vertical and horizontal red lines indicate the GWS level. The diagonal red line indicates y = x. The orange dots indicate variants-in-hits that are significant for U-REGLE but not significant for our M-REGLE and green dots indicate variants-in-hits that are significant for our M-REGLE but not significant for U-REGLE. c) A 3 way Venn diagram of the GWAS catalog loci, loci discovered by M-REGLE and loci discovered by U-REGLE. d) Comparison of the chi-square statistics for all known significant variants in GWAS catalog for both U-REGLE and M-REGLE. The difference is statistically significant.
Figure 4:
Figure 4:. M-REGLE on ECG lead I and PPG increases genomic discovery.
a) Manhattan plot depicting M-REGLE GWAS p-values. Black gene names indicate the closest gene for each locus with −log10p> 20 and red dots denote all other GWS loci. Blue gene names and dots indicate loci also identified in U-REGLE. b) Comparison of M-REGLE GWS variants-in-hits with U-REGLE. The X-axis is the −log p-value of Baseline. The Y-axis is the −log p-value of the M-REGLE. All p-values (a) and (b) are computed by summing the chi-square statistics for all 12 embeddings to perform a single joint chi-square test. The vertical and horizontal red lines indicate the GWS level. The diagonal red line indicates y = x. The orange dots indicate variants-in-hits that are significant for U-REGLE but not significant for our M-REGLE and green dots indicate variants-in-hits that are significant for our M-REGLE but not significant for Baseline. c) A 3 way Venn diagram of the GWAS catalog loci, loci discovered by M-REGLE and loci discovered by U-REGLE. d) Comparison of the chi-square statistics for all known significant variants in GWAS catalog for both U-REGLE and M-REGLE. The difference is statistically significant.
Figure 5:
Figure 5:. M-REGLE improves Afib genetic risk score.
a) X-axis is genetic risk score percentile and Y-axis is the prevalence. Lower is better for the bottom percentiles; higher is better for the top percentiles. b) AUROC, and c) AUPRC (precision recall) Star (*) sign indicates a statistically significant difference between the two methods using paired bootstrapping (100 repetitions) with 95% confidence.

Similar articles

References

    1. Elliott Lloyd T, Sharp Kevin, Alfaro-Almagro Fidel, Shi Sinan, Miller Karla L, Douaud Gwenaëlle, Marchini Jonathan, and Smith Stephen M. Genome-wide association studies of brain imaging phenotypes in UK biobank. Nature, 562(7726):210–216, October 2018. - PMC - PubMed
    1. Jónsson B.A., Bjornsdottir G., Thorgeirsson T.E., et al. Brain age prediction using deep learning uncovers associated sequence variants. Nature communications, 10:5409, 2019. - PMC - PubMed
    1. Verweij Niek, Benjamins Jan-Walter, Morley Michael P, van de Vegte Yordi J, Teumer Alexander, Trenkwalder Teresa, Reinhard Wibke, Cappola Thomas P, and van der Harst Pim. The genetic makeup of the electrocardiogram. Cell Syst, 11(3):229–238.e5, September 2020. - PMC - PubMed
    1. Alipanahi Babak, Hormozdiari Farhad, Behsaz Babak, Cosentino Justin, McCaw Zachary R, Schorsch Emanuel, Sculley D, Dorfman Elizabeth H, Foster Paul J, Peng Lily H, Phene Sonia, Hammel Naama, Carroll Andrew, Khawaja Anthony P, and McLean Cory Y. Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology. Am. J. Hum. Genet., 108(7):1217–1230, July 2021. - PMC - PubMed
    1. Aung Nay, Vargas Jose D, Yang Chaojie, Fung Kenneth, Sanghvi Mihir M, Piechnik Stefan K, Neubauer Stefan, Manichaikul Ani, Rotter Jerome I, Taylor Kent D, Lima Joao A C, Bluemke David A, Kawut Steven M, Petersen Steffen E, and Munroe Patricia B. Genome-wide association analysis reveals insights into the genetic architecture of right ventricular structure and function. Nat. Genet., pages 1–9, June 2022. - PMC - PubMed

Publication types