Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;144(2-3):173-189.
doi: 10.1007/s00439-024-02680-3. Epub 2024 Aug 7.

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6

Affiliations

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6

Jing Zhang et al. Hum Genet. 2025 Mar.

Abstract

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflcit of interest: The authors have not disclosed any competing interests.

Figures

Fig 1.
Fig 1.. Distributions of experimental fitness scores and predicted scores.
(A) Histogram showing the distribution of experimental fitness scores for nonsense and synonymous mutations (left) and missense mutations (right); (B) histograms of predicted scores from a selected submission from each participating team. The Y-axis represents the proportion of mutations, while the X-axis represents experimental scores in panels (A) and (B)
Fig 2.
Fig 2.. Performance assessment of predictors.
(A) Receiver Operating Characteristic (ROC) curves for predicting deleterious mutations; (B) Head-to-head comparison matrix of predictors, with colors indicating the number of datasets in which one predictor (row) outperforms another (column); (C) Boxplot of the distribution of ranks for predictors in simulated datasets. The box edges represent the first and third quartiles of the ranks, the line inside the box denotes the median rank, whiskers extend to 1.5 times the interquartile range from the box edges, and circles represent outliers beyond 1.5 times the interquartile range.
Fig 3.
Fig 3.. Effects of mutations on functional loops were poorly predicted by top-performing predictors.
(A) Heatmap of the median differences between experimental scores and those of the top-performing predictors at each position, with blue indicating lower and red indicating higher differences; (B) Structural representation of HEM3 (PDB ID: 5m6r, chain A) highlighting the active-site loop, cofactor-binding loop, insertion region, and residues 354 to 356 in red. ES2 and the phosphate group are displayed as spheres; (C) Distributions of experimental scores (blue) and predicted scores from submission 5_1 (green) within the active-site loop, cofactor-binding loop, and insertion region.
Fig 4.
Fig 4.. Correlation among predictors and the role of conservation in prediction.
(A) A heatmap displaying absolute Kendall’s tau correlation coefficients between predictors. The absolute correlation coefficients are color-coded, with blue indicating lower and red indicating higher correlation; (B) Scatter plots depicting the correlation between the conservation index and the median of all predicted scores (left) or experimental scores (right) for mutations at each position. The Y-axis represents the median predicted/experimental score, while the X-axis represents the conservation index; (C) Bar graphs showing the ratio of deleterious mutations at conserved positions as indicated by experimental scores and predictors (upper graph) and the ratio of benign mutations at unconserved positions as indicated by experimental scores and predictors (lower graph).

Similar articles

Cited by

References

    1. Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7: Unit7 20. doi: 10.1002/0471142905.hg0720s76 - DOI - PMC - PubMed
    1. Ancien F, Pucci F, Godfroid M, Rooman M (2018) Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Sci Rep 8: 4480. doi: 10.1038/s41598-018-22531-2 - DOI - PMC - PubMed
    1. Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M (2022) ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38: 2102–2110. doi: 10.1093/bioinformatics/btac020 - DOI - PMC - PubMed
    1. Brnich SE, Abou Tayoun AN, Couch FJ, Cutting GR, Greenblatt MS, Heinen CD, Kanavy DM, Luo X, McNulty SM, Starita LM, Tavtigian SV, Wright MW, Harrison SM, Biesecker LG, Berg JS, Clinical Genome Resource Sequence Variant Interpretation Working G (2019) Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 12: 3. doi: 10.1186/s13073-019-0690-2 - DOI - PMC - PubMed
    1. Bustad HJ, Kallio JP, Laitaoja M, Toska K, Kursula I, Martinez A, Janis J (2021) Characterization of porphobilinogen deaminase mutants reveals that arginine-173 is crucial for polypyrrole elongation mechanism. iScience 24: 102152. doi: 10.1016/j.isci.2021.102152 - DOI - PMC - PubMed

LinkOut - more resources