Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Sep 20;23(5):bbac191.
doi: 10.1093/bib/bbac191.

Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine

Affiliations
Review

Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine

Sreya Vadapalli et al. Brief Bioinform. .

Abstract

Precision medicine uses genetic, environmental and lifestyle factors to more accurately diagnose and treat disease in specific groups of patients, and it is considered one of the most promising medical efforts of our time. The use of genetics is arguably the most data-rich and complex components of precision medicine. The grand challenge today is the successful assimilation of genetics into precision medicine that translates across different ancestries, diverse diseases and other distinct populations, which will require clever use of artificial intelligence (AI) and machine learning (ML) methods. Our goal here was to review and compare scientific objectives, methodologies, datasets, data sources, ethics and gaps of AI/ML approaches used in genomics and precision medicine. We selected high-quality literature published within the last 5 years that were indexed and available through PubMed Central. Our scope was narrowed to articles that reported application of AI/ML algorithms for statistical and predictive analyses using whole genome and/or whole exome sequencing for gene variants, and RNA-seq and microarrays for gene expression. We did not limit our search to specific diseases or data sources. Based on the scope of our review and comparative analysis criteria, we identified 32 different AI/ML approaches applied in variable genomics studies and report widely adapted AI/ML algorithms for predictive diagnostics across several diseases.

Keywords: artificial intelligence; gene expression; gene variant; machine learning; predictive analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
AI/ML approaches using gene variant and gene expression data for traditional bioinformatics and predictive analysis. Figure includes 24 AI/ML approaches, variable diseases [inflammatory bowel disease (IBD); systemic lupus erythematosus (SLE); colon cancer (CC); acute myeloid leukemia (AML); major depressive disorder (MDD); ulcerative colitis (UC); sepsis (Sep.); prostate cancer (PC.); Alzheimer’s disease (AD); hypertension (Hyp.); ovarian cancer (OC); Crohn’s disease (CD); obesity (Ob.); breast cancer (BC); malignant pleural mesothelioma (MPM); schizophrenia (SCZ); autism spectrum disorder (ASD); ovarian failure (OF); premature ovarian failure (POF); risk of illness (RI); autism (Au.)] and AI/ML algorithms [Generalized linear models (GLM); Genetic Algorithm (GA); Multivariate Linear Regression (MLR); Random Forest (RF); Bayesian Networks (BN); Support Vector Machine(SVM); Expectation–Maximization (EM); Bioinformatics Analysis (BI); Random committee ensemble learning (RC); Elastic net regularized generalized linear model (GLMNET); Linear discriminant analysis (LDA); Quadratic Discriminant Analysis (QDA); AdaBoost(AB); Formal Concept Analysis (FCA); Combined Annotation Dependent Depletion (CADD); Very Efficient Substitution Transposition (VEST); Deep Learning Neural Networks (DNNs); Decision Tree (DT); LogitBoost (LB); Gradient boosting (GB); Extreme gradient boosting (XGB); Gaussian Process Classification (GPC); Logistic Regression (LR); Artificial Neural Network (ANN); Greedy Thick Thinning algorithm (GTT); Neural Networks(NN); K-Nearest Neighbors (K-NN); Clustering (CU); Non-negative Matrix Factorization (NMF); Naïve Bayes (NB); MERGE (mutation, expression hubs, known regulators, genomic CNV and methylation); Bayesian Additive Regression Trees (BART)].
Figure 2
Figure 2
Total number of machine learning algorithms applied for predictive analysis. Figure includes algorithms: Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting, Extreme gradient boosting XGBoost, Elastic net regularized generalized linear model, Logistic regression (LR), Artificial neural network (ANN), Naïve Bayes (NB), Bayesian Additive Regression Trees, Bayesian Networks, Greedy Thick Thinning algorithm, k-nearest neighbor (K-NN), Decision tree (DT), Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA), Gaussian process classification (GPC), Adaboost (AB), Non-negative Matrix Factorization (NMF), C4.5, Formal concept analysis (FCA), Clustering (Unsupervised), Multivariate Linear Regression (MLR), Genetic Algorithm (GA), Logit Boost, AVA,Dx (Analysis of Variation for Association with Disease), OncoCast-MPM machine-learning risk-prediction model, Combined Annotation Dependent Depletion (CADD), Very Efficient Substitution Transposition (VEST), Random Committee Ensemble Learning (RCEL), Deep Learning Neural Networks (DNNs), MERGE, and Expectation–maximization (EM).

Similar articles

Cited by

References

    1. Zeeshan S, Xiong R, Liang BT, et al. 100 Years of evolving gene-disease complexities and scientific debutants. Brief Bioinform 2020;21(3):885–905. 10.1093/bib/bbz038. - DOI - PubMed
    1. Ahmed Z, Zeeshan S, Mendhe D, et al. Human gene and disease associations for clinical-genomics and precision medicine research. Clin Transl Med 2020;10(1):297–318. 10.1002/ctm2.28. - DOI - PMC - PubMed
    1. Martin AR, Kanai M, Kamatani Y, et al. Publisher correction: clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 2021;53(5):763. 10.1038/s41588-021-00797-z. - DOI - PubMed
    1. Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021;9:e11724. 10.7717/peerj.11724. - DOI - PMC - PubMed
    1. Ahmed Z, Renart EG, Mishra D, et al. JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping. FEBS Open Bio 2021;11(9):2441–52. 10.1002/2211-5463.13261. - DOI - PMC - PubMed

Publication types