. 2023 Aug 19;14(1):5058.

doi: 10.1038/s41467-023-40797-7.

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Salvatore Daniele Bianco^{1

2}, Luca Parca^{1

3}, Francesco Petrizzelli¹, Tommaso Biagini¹, Agnese Giovannetti⁴, Niccolò Liorni^{1

2}, Alessandro Napoli¹, Massimo Carella⁵, Vincent Procaccio^{6

7}, Marie T Lott⁷, Shiping Zhang^{7

8}, Angelo Luigi Vescovi⁹, Douglas C Wallace^{7

10}, Viviana Caputo², Tommaso Mazza¹¹

Affiliations

¹ Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
² Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.
³ Italian Space Agency, Rome, Italy.
⁴ Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
⁵ Medical Genetics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo, (FG), Italy.
⁶ University of Angers, Genetics Department CHU Angers, Mitolab UMR CNRS 6015-INSERM U1083, F-49000, Angers, France.
⁷ Center for Mitochondrial and Epigenomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁸ Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁹ ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, Fondazione IRCSS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
¹⁰ Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
¹¹ Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy. t.mazza@css-mendel.it.

PMID: 37598215
PMCID: PMC10439926
DOI: 10.1038/s41467-023-40797-7

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Salvatore Daniele Bianco et al. Nat Commun. 2023.

. 2023 Aug 19;14(1):5058.

doi: 10.1038/s41467-023-40797-7.

Authors

Affiliations

¹ Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
² Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.
³ Italian Space Agency, Rome, Italy.
⁴ Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
⁵ Medical Genetics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo, (FG), Italy.
⁶ University of Angers, Genetics Department CHU Angers, Mitolab UMR CNRS 6015-INSERM U1083, F-49000, Angers, France.
⁷ Center for Mitochondrial and Epigenomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁸ Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁹ ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, Fondazione IRCSS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy.
¹⁰ Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
¹¹ Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo (FG), Italy. t.mazza@css-mendel.it.

PMID: 37598215
PMCID: PMC10439926
DOI: 10.1038/s41467-023-40797-7

Abstract

Mitochondrial dysfunction has pleiotropic effects and is frequently caused by mitochondrial DNA mutations. However, factors such as significant variability in clinical manifestations make interpreting the pathogenicity of variants in the mitochondrial genome challenging. Here, we present APOGEE 2, a mitochondrially-centered ensemble method designed to improve the accuracy of pathogenicity predictions for interpreting missense mitochondrial variants. Built on the joint consensus recommendations by the American College of Medical Genetics and Genomics/Association for Molecular Pathology, APOGEE 2 features an improved machine learning method and a curated training set for enhanced performance metrics. It offers region-wise assessments of genome fragility and mechanistic analyses of specific amino acids that cause perceptible long-range effects on protein structure. With clinical and research use in mind, APOGEE 2 scores and pathogenicity probabilities are precompiled and available in MitImpact. APOGEE 2's ability to address challenges in interpreting mitochondrial missense variants makes it an essential tool in the field of mitochondrial genetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Distribution of pathogenic and likely pathogenic missense variants in the mitochondrial genes and population databases.**
a Counts (top) of reported and confirmed missense variants for all mtDNA protein-coding genes and their frequency (bottom) normalized on gene length. b Common missense variants between HelixMTdb, gnomAD, and MITOMAP’s confirmed and reported variants. c Distribution of heteroplasmic (gnomAD, n = 164, HelixMTdb, n = 204) and homoplasmic (gnomAD, n = 187, HelixMTdb, n = 198) reported variants in gnomAD (left) and HelixMTdb (right) based on their AF. Dashed lines represent the 0.002%, 0.5%, and 1% AF thresholds. Whiskers represent the 95% CIs around the median; the box limits represent the 25th and 75th percentiles (Q1 and Q3). GnomAD variants’ AF values range from 1.77E−05 to 3.70E−04 (heteroplasmic) and from 1.77E−05 to 0.99 (homoplasmic). HelixMTdb variants’ AF values range from 5.10E−06 to 1.47E−03 (heteroplasmic) and from 5.10E−06 to 0.99 (homoplasmic). Red dots are outlier variants by AF.

**Fig. 2. APOGEE 2 performance evaluation.**
a Average test auPRC values of the selected ML methods, calculated during the training phase. Support Vector Machine classifier with radial basis functions kernel (rbfSVC), Balanced Bagging using Gaussian Naive Bayes (GNB_BalancedBagging) and K-Nearest Neighbors (KNN_BalancedBagging) as base estimators, Balanced Random Forest (BalancedRF), KNN Bagging balanced through RUS and SMOTE techniques (KNN_RusSmote). b Feature importance assessed on the whole Dataset 1; threshold set to 1%. c AuROC values calculated on 118 and 13 neutral and pathogenic test variants for APOGEE versions 1 and 2. d Performance comparison of APOGEE 2 *versus* other meta-predictors in terms of auROC. APOGEE 2’s auROC is reported as the mean ±95% CIs obtained through cross-validation. e Time-dependent APOGEE 2’s auROC values obtained by predicting MITOMAP 2022 upon training on the 2008–2020 contents; for each year, the sample mean distribution is reported in gray.

**Fig. 3. APOGEE 2 scores distribution and spatial autocorrelation.**
a Distribution of APOGEE 2 scores. Colors represent classes of pathogenicity: green (benign, probability of pathogenicity (P) ≤ 0.001, score (S) ≤ 0.062), light green (likely benign, 0.001 < P ≤ 0.1, 0.062 < S ≤ 0.265), yellow (VUS, 0.1 < P < 0.9, 0.265 < S < 0.716), orange (likely pathogenic, 0.9 ≤ P < 0.99, 0.716 ≤ S < 0.907), red (pathogenic, P ≥ 0.99, S ≥ 0.907). b Misclassification rate of 100 test folds calculated on Dataset 1. c Mitochondrial protein complexes localization on the bisector of a 3D space. Colors have the same meaning as Fig. 3a. d Global spatial autocorrelation computed at different cutoff distances. Blue circles mark the maximum values for each protein complex. e Low-risk (green) and high-risk (in red) amino acid regions of the mitochondrial Complex I subunits. Highlighted in red, we underline the MITOMAP confirmed variants that localize on TMH3 of MT-ND6 and on the MT-ND3 loop.

**Fig. 4. Long-range effects analysis through molecular dynamics simulation.**
a Structure of the mtDNA-encoded subunits of the complex I membrane arm. b Average structures of the wild-type, Ser34Pro, and Thr35Pro MT-ND3 protein models (left) and wild-type, Ser34Phe, and Ser34Tyr (right). c RMSF profiles of the heavy atoms of the MT-ND3 loop (residues 24–54) for both wild-type and mutants. d 3D representations of the dynamics of the wild-type, Ser34Pro, and Thr35Pro MT-ND3 protein models. In all subfigures b–d, wild-type is colored green, Ser34Pro is yellow, Thr35Pro is red, Ser34Phe is pink, and Ser34Tyr is cyan. e Average structures of the MT-ND6 protein. TMH3 is highlighted in dark orange.

**Fig. 5. APOGEE 2 ML pipeline.**
It includes data preprocessing, i.e., scaling (a), imputation of missing values (b), and feature selection (c), model tuning by 10-folds Grid Search CV (d), training of an ML method with the best hyperparameter combination obtained in (d) and testing (e).

See this image and copyright information in PMC

Cited by

Mitochondrial DNA variant detection in over 6,500 rare disease families by the systematic analysis of exome and genome sequencing data resolves undiagnosed cases.
Stenton SL, Laricchia K, Lake NJ, Chaluvadi S, Ganesh V, DiTroia S, Osei-Owusu I, Pais L, O'Heir E, Austin-Tse C, O'Leary M, Abu Shanap M, Barrows C, Berger S, Bönnemann CG, Bujakowska KM, Campagna DR, Compton AG, Donkervoort S, Fleming MD, Gallacher L, Gleeson JG, Haliloglu G, Pierce EA, Place EM, Sankaran VG, Shimamura A, Stark Z, Tan TY, Thorburn DR, White SM, Zaki MS; Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) Consortium; Vilain E, Lek M, Rehm HL, O'Donnell-Luria A. Stenton SL, et al. HGG Adv. 2025 Jul 10;6(3):100441. doi: 10.1016/j.xhgg.2025.100441. Epub 2025 Apr 15. HGG Adv. 2025. PMID: 40241304 Free PMC article.
Mitochondrial and Nuclear DNA Variants in Amyotrophic Lateral Sclerosis: Enrichment in the Mitochondrial Control Region and Sirtuin Pathway Genes in Spinal Cord Tissue.
Cox SN, Lo Giudice C, Lavecchia A, Poeta ML, Chiara M, Picardi E, Pesole G. Cox SN, et al. Biomolecules. 2024 Mar 28;14(4):411. doi: 10.3390/biom14040411. Biomolecules. 2024. PMID: 38672428 Free PMC article.
mtDNA-Server 2: advancing mitochondrial DNA analysis through highly parallelized data processing and interactive analytics.
Weissensteiner H, Forer L, Kronenberg F, Schönherr S. Weissensteiner H, et al. Nucleic Acids Res. 2024 Jul 5;52(W1):W102-W107. doi: 10.1093/nar/gkae296. Nucleic Acids Res. 2024. PMID: 38709886 Free PMC article.
Mitochondrial DNA variants revealed by whole exome sequencing: from screening to diagnosis and follow-up.
Skoczylas S, Płoszaj T, Gadzalska K, Gorządek M, Jakiel P, Juścińska E, Malarska M, Traczyk-Borszyńska M, Biezynska H, Rychlik M, Pastorczak A, Zmysłowska A. Skoczylas S, et al. Neurogenetics. 2025 Mar 26;26(1):38. doi: 10.1007/s10048-025-00820-z. Neurogenetics. 2025. PMID: 40138026
Our current understanding of the biological impact of endometrial cancer mtDNA genome mutations and their potential use as a biomarker.
Khadka P, Young CKJ, Sachidanandam R, Brard L, Young MJ. Khadka P, et al. Front Oncol. 2024 Jun 27;14:1394699. doi: 10.3389/fonc.2024.1394699. eCollection 2024. Front Oncol. 2024. PMID: 38993645 Free PMC article. Review.

See all "Cited by" articles

References

1. Muller HJ. The relation of recombination to mutational advance. Mutat. Res. 1964;106:2–9. - PubMed
1. Johnston IG, et al. Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism. eLife. 2015;4:e07464. - PMC - PubMed
1. Shokolenko IN, Wilson GL, Alexeyev MF. The “fast” and the “slow” modes of mitochondrial DNA degradation. Mitochondrial DNA A DNA Mapp. Mitochondrial DNA A Mapp Seq. Anal. 2016;27:490–498. - PMC - PubMed
1. Allio R, Donega S, Galtier N, Nabholz B. Large variation in the ratio of mitochondrial to nuclear mutation rate across animals: implications for genetic diversity and the use of mitochondrial DNA as a molecular marker. Mol. Biol. Evol. 2017;34:2762–2772. - PubMed
1. Szczepanowska K, Trifunovic A. Different faces of mitochondrial DNA mutators. Biochim. Biophys. Acta. 2015;1847:1362–1372. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Affiliations

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources