. 2024 Nov 3;14(1):26503.

doi: 10.1038/s41598-024-78553-6.

Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases

William DeGroat^#¹, Habiba Abdelhalim^#¹, Elizabeth Peker^#¹, Neev Sheth^#¹, Rishabh Narayanan^#¹, Saman Zeeshan², Bruce T Liang^{3

4}, Zeeshan Ahmed^{5

6

7

8}

Affiliations

¹ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson St, New Brunswick, NJ, 08901, USA.
² Department of Biomedical and Health Informatics, UMKC School of Medicine, 2411 Holmes Street, Kansas City, MO, 64108, USA.
³ Pat and Jim Calhoun Cardiology Center, UConn Health, 263 Farmington Ave, Farmington, CT, USA.
⁴ UConn School of Medicine, University of Connecticut, 263 Farmington Ave, Farmington, CT, USA.
⁵ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson St, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.
⁶ UConn School of Medicine, University of Connecticut, 263 Farmington Ave, Farmington, CT, USA. zahmed@ifh.rutgers.edu.
⁷ Department of Medicine, Division of Cardiovascular Disease and Hypertension, Robert Wood Johnson Medical School, Rutgers Health, 125 Paterson St, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.
⁸ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson Street, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.

^# Contributed equally.

PMID: 39489837
PMCID: PMC11532369
DOI: 10.1038/s41598-024-78553-6

Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases

William DeGroat et al. Sci Rep. 2024.

. 2024 Nov 3;14(1):26503.

doi: 10.1038/s41598-024-78553-6.

Authors

William DeGroat^#¹, Habiba Abdelhalim^#¹, Elizabeth Peker^#¹, Neev Sheth^#¹, Rishabh Narayanan^#¹, Saman Zeeshan², Bruce T Liang^{3

4}, Zeeshan Ahmed^{5

6

7

8}

Affiliations

¹ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson St, New Brunswick, NJ, 08901, USA.
² Department of Biomedical and Health Informatics, UMKC School of Medicine, 2411 Holmes Street, Kansas City, MO, 64108, USA.
³ Pat and Jim Calhoun Cardiology Center, UConn Health, 263 Farmington Ave, Farmington, CT, USA.
⁴ UConn School of Medicine, University of Connecticut, 263 Farmington Ave, Farmington, CT, USA.
⁵ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson St, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.
⁶ UConn School of Medicine, University of Connecticut, 263 Farmington Ave, Farmington, CT, USA. zahmed@ifh.rutgers.edu.
⁷ Department of Medicine, Division of Cardiovascular Disease and Hypertension, Robert Wood Johnson Medical School, Rutgers Health, 125 Paterson St, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.
⁸ Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson Street, New Brunswick, NJ, 08901, USA. zahmed@ifh.rutgers.edu.

^# Contributed equally.

PMID: 39489837
PMCID: PMC11532369
DOI: 10.1038/s41598-024-78553-6

Erratum in

Publisher Correction: Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases.
DeGroat W, Abdelhalim H, Peker E, Sheth N, Narayanan R, Zeeshan S, Liang BT, Ahmed Z. DeGroat W, et al. Sci Rep. 2025 Jun 3;15(1):19417. doi: 10.1038/s41598-025-03203-4. Sci Rep. 2025. PMID: 40461550 Free PMC article. No abstract available.

Abstract

Cardiovascular diseases (CVDs) are complex, multifactorial conditions that require personalized assessment and treatment. Advancements in multi-omics technologies, namely RNA sequencing and whole-genome sequencing, have provided translational researchers with a comprehensive view of the human genome. The efficient synthesis and analysis of this data through integrated approach that characterizes genetic variants alongside expression patterns linked to emerging phenotypes, can reveal novel biomarkers and enable the segmentation of patient populations based on personalized risk factors. In this study, we present a cutting-edge methodology rooted in the integration of traditional bioinformatics, classical statistics, and multimodal machine learning techniques. Our approach has the potential to uncover the intricate mechanisms underlying CVD, enabling patient-specific risk and response profiling. We sourced transcriptomic expression data and single nucleotide polymorphisms (SNPs) from both CVD patients and healthy controls. By integrating these multi-omics datasets with clinical demographic information, we generated patient-specific profiles. Utilizing a robust feature selection approach, we identified a signature of 27 transcriptomic features and SNPs that are effective predictors of CVD. Differential expression analysis, combined with minimum redundancy maximum relevance feature selection, highlighted biomarkers that explain the disease phenotype. This approach prioritizes both biological relevance and efficiency in machine learning. We employed Combination Annotation Dependent Depletion scores and allele frequencies to identify variants with pathogenic characteristics in CVD patients. Classification models trained on this signature demonstrated high-accuracy predictions for CVD. The best performing of these models was an XGBoost classifier optimized via Bayesian hyperparameter tuning, which was able to correctly classify all patients in our test dataset. Using SHapley Additive exPlanations, we created risk assessments for patients, offering further contextualization of these predictions in a clinical setting. Across the cohort, RPL36AP37 and HBA1 were scored as the most important biomarkers for predicting CVDs. A comprehensive literature review revealed that a substantial portion of the diagnostic biomarkers identified have previously been associated with CVD. The framework we propose in this study is unbiased and generalizable to other diseases and disorders.

Keywords: Artificial Intelligence; Cardiovascular diseases; Genomics; Machine learning; Multi-omics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Study design and workflow. This figure represents a summary of our study design: (I) Transcriptomic expression, (II) Pathogenic Variant, (III) Multimodal Machine Learning, and (IV) Results. Various inputs and their implementation are also included (RNA-sequencing, Clinical Records, Whole Genome Sequencing, Annotation databases and Biomarkers).

**Fig. 2**
Methodology. This figure presents the k-Nearest Neighbors (k-NN) imputation to address missing values present in our RNA-seq expression data. DESeq2 was utilized for normalization and gene differential expression on four clinical sub cohorts to reduce the effect of confounding variables. Next, minimum redundancy – maximum relevance (MRMR) was performed to identify biomarkers proficient in predicting CVDs. Simultaneously, significant single nucleotide variants (SNVs) were annotated, and their pathogenicity determined for downstream analysis. Utilizing the clinically integrated transcriptomics and genomics dataset (CIGT) of significant biomarkers and their variants, machine learning algorithms (Random Forest, Logistic Regression, and Xtreme Gradient Boosting) to predict CVDs. Boxes highlighted in yellow refer to input data, blue refers to machine learning approaches, orange highlights clinical records, red refers to statistical analyses, while purple refers to bioinformatic analyses, and green highlights results.

**Fig. 3**
Differentially expressed genes and their expression plots. This figure presents the results of gene expression analysis and that includes, **(A)** Fold change in expression level based on differential expression (DE) analysis and redundancy – maximum relevance (MRMR) feature selection; **(B)** Significance levels of genes based on DE and MRMR; **(C)** Gene annotations for cellular component, molecular function, biological processes, and phenotypic abnormalities; and **(D)** RNA-seq expression plots for the ten most significant biomarkers.

**Fig. 4**
Variant feature selection. This figure presents the rare, deleterious variants affecting our CVD associated biomarkers based on, **(A)** Combined Annotation Dependent Depletion (CADD) Score; **(B)** Allele frequency obtained from the Genome Aggregation Database (gnomAD); and **(C)** annotations of pathogenic single nucleotide variants (SNVs).

**Fig. 5**
Predictive analysis. This figure presents the predictive confidence of our ML model and that includes, **(A)** Predictive certainty of three ML algorithms (Random Forest, Logistic Regression and Xtreme Gradient Boosting) on testing dataset; and **(B)** Receiver operating characteristic (ROC) curve denoting the sensitivity and specificity of the classifiers.

See this image and copyright information in PMC

References

1. Tsao, C. W. et al. Heart Disease and Stroke Statistics-2022 update: a Report from the American Heart Association. Circulation. 145 (8), e153–e639 (2022). - DOI - PubMed
1. Krittanawong, C. et al. Artificial Intelligence and Cardiovascular Genetics. Life (Basel, Switzerland)12(2), 279 (2022). - PMC - PubMed
1. Wung, S. F., Hickey, K. T., Taylor, J. Y. & Gallek, M. J. Cardiovascular genomics. J. Nurs. Scholarship: Official Publication Sigma Theta Tau Int. Honor Soc. Nurs.45 (1), 60–68 (2013). - DOI - PMC - PubMed
1. Patel, K. K. et al. Genomic approaches to identify and investigate genes associated with atrial fibrillation and heart failure susceptibility. Hum. Genomics. 17 (1), 47 (2023). - DOI - PMC - PubMed
1. Ahmed, Z. Deciphering expression and variants in cardiovascular disease genes among heart failure population for precision medicine. ESC Heart Fail.11 (1), 606–609 (2024). - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases

Affiliations

Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous