Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 19;14(1):5058.
doi: 10.1038/s41467-023-40797-7.

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Affiliations

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Salvatore Daniele Bianco et al. Nat Commun. .

Abstract

Mitochondrial dysfunction has pleiotropic effects and is frequently caused by mitochondrial DNA mutations. However, factors such as significant variability in clinical manifestations make interpreting the pathogenicity of variants in the mitochondrial genome challenging. Here, we present APOGEE 2, a mitochondrially-centered ensemble method designed to improve the accuracy of pathogenicity predictions for interpreting missense mitochondrial variants. Built on the joint consensus recommendations by the American College of Medical Genetics and Genomics/Association for Molecular Pathology, APOGEE 2 features an improved machine learning method and a curated training set for enhanced performance metrics. It offers region-wise assessments of genome fragility and mechanistic analyses of specific amino acids that cause perceptible long-range effects on protein structure. With clinical and research use in mind, APOGEE 2 scores and pathogenicity probabilities are precompiled and available in MitImpact. APOGEE 2's ability to address challenges in interpreting mitochondrial missense variants makes it an essential tool in the field of mitochondrial genetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Distribution of pathogenic and likely pathogenic missense variants in the mitochondrial genes and population databases.
a Counts (top) of reported and confirmed missense variants for all mtDNA protein-coding genes and their frequency (bottom) normalized on gene length. b Common missense variants between HelixMTdb, gnomAD, and MITOMAP’s confirmed and reported variants. c Distribution of heteroplasmic (gnomAD, n = 164, HelixMTdb, n = 204) and homoplasmic (gnomAD, n = 187, HelixMTdb, n = 198) reported variants in gnomAD (left) and HelixMTdb (right) based on their AF. Dashed lines represent the 0.002%, 0.5%, and 1% AF thresholds. Whiskers represent the 95% CIs around the median; the box limits represent the 25th and 75th percentiles (Q1 and Q3). GnomAD variants’ AF values range from 1.77E−05 to 3.70E−04 (heteroplasmic) and from 1.77E−05 to 0.99 (homoplasmic). HelixMTdb variants’ AF values range from 5.10E−06 to 1.47E−03 (heteroplasmic) and from 5.10E−06 to 0.99 (homoplasmic). Red dots are outlier variants by AF.
Fig. 2
Fig. 2. APOGEE 2 performance evaluation.
a Average test auPRC values of the selected ML methods, calculated during the training phase. Support Vector Machine classifier with radial basis functions kernel (rbfSVC), Balanced Bagging using Gaussian Naive Bayes (GNB_BalancedBagging) and K-Nearest Neighbors (KNN_BalancedBagging) as base estimators, Balanced Random Forest (BalancedRF), KNN Bagging balanced through RUS and SMOTE techniques (KNN_RusSmote). b Feature importance assessed on the whole Dataset 1; threshold set to 1%. c AuROC values calculated on 118 and 13 neutral and pathogenic test variants for APOGEE versions 1 and 2. d Performance comparison of APOGEE 2 versus other meta-predictors in terms of auROC. APOGEE 2’s auROC is reported as the mean ±95% CIs obtained through cross-validation. e Time-dependent APOGEE 2’s auROC values obtained by predicting MITOMAP 2022 upon training on the 2008–2020 contents; for each year, the sample mean distribution is reported in gray.
Fig. 3
Fig. 3. APOGEE 2 scores distribution and spatial autocorrelation.
a Distribution of APOGEE 2 scores. Colors represent classes of pathogenicity: green (benign, probability of pathogenicity (P) ≤ 0.001, score (S) ≤ 0.062), light green (likely benign, 0.001 < P ≤ 0.1, 0.062 < S ≤ 0.265), yellow (VUS, 0.1 < P < 0.9, 0.265 < S < 0.716), orange (likely pathogenic, 0.9 ≤ P < 0.99, 0.716 ≤ S < 0.907), red (pathogenic, P ≥ 0.99, S ≥ 0.907). b Misclassification rate of 100 test folds calculated on Dataset 1. c Mitochondrial protein complexes localization on the bisector of a 3D space. Colors have the same meaning as Fig. 3a. d Global spatial autocorrelation computed at different cutoff distances. Blue circles mark the maximum values for each protein complex. e Low-risk (green) and high-risk (in red) amino acid regions of the mitochondrial Complex I subunits. Highlighted in red, we underline the MITOMAP confirmed variants that localize on TMH3 of MT-ND6 and on the MT-ND3 loop.
Fig. 4
Fig. 4. Long-range effects analysis through molecular dynamics simulation.
a Structure of the mtDNA-encoded subunits of the complex I membrane arm. b Average structures of the wild-type, Ser34Pro, and Thr35Pro MT-ND3 protein models (left) and wild-type, Ser34Phe, and Ser34Tyr (right). c RMSF profiles of the heavy atoms of the MT-ND3 loop (residues 24–54) for both wild-type and mutants. d 3D representations of the dynamics of the wild-type, Ser34Pro, and Thr35Pro MT-ND3 protein models. In all subfigures bd, wild-type is colored green, Ser34Pro is yellow, Thr35Pro is red, Ser34Phe is pink, and Ser34Tyr is cyan. e Average structures of the MT-ND6 protein. TMH3 is highlighted in dark orange.
Fig. 5
Fig. 5. APOGEE 2 ML pipeline.
It includes data preprocessing, i.e., scaling (a), imputation of missing values (b), and feature selection (c), model tuning by 10-folds Grid Search CV (d), training of an ML method with the best hyperparameter combination obtained in (d) and testing (e).

Similar articles

Cited by

References

    1. Muller HJ. The relation of recombination to mutational advance. Mutat. Res. 1964;106:2–9. - PubMed
    1. Johnston IG, et al. Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism. eLife. 2015;4:e07464. - PMC - PubMed
    1. Shokolenko IN, Wilson GL, Alexeyev MF. The “fast” and the “slow” modes of mitochondrial DNA degradation. Mitochondrial DNA A DNA Mapp. Mitochondrial DNA A Mapp Seq. Anal. 2016;27:490–498. - PMC - PubMed
    1. Allio R, Donega S, Galtier N, Nabholz B. Large variation in the ratio of mitochondrial to nuclear mutation rate across animals: implications for genetic diversity and the use of mitochondrial DNA as a molecular marker. Mol. Biol. Evol. 2017;34:2762–2772. - PubMed
    1. Szczepanowska K, Trifunovic A. Different faces of mitochondrial DNA mutators. Biochim. Biophys. Acta. 2015;1847:1362–1372. - PubMed

Publication types