Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 14;15(6):e1007112.
doi: 10.1371/journal.pcbi.1007112. eCollection 2019 Jun.

Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome

Affiliations

Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome

Kymberleigh A Pagel et al. PLoS Comput Biol. .

Abstract

Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Characteristics of variants included in the functional analyses.
(A) Training variants in canonical and noncanonical protein sequences. (B) Recurrently impacted residues in COSMIC. (C) Variant size in gnomAD, HGMD, COSMIC, and recurrent variants in COSMIC (COSMIC-R). Size of complex indels is the maximum of the number of amino acid residues inserted or deleted. (D) Variants per protein in COSMIC.
Fig 2
Fig 2. Relative enrichment of mechanisms impacted by pathogenic variants from HGMD compared to gnomAD.
Negative trend values correspond to enrichment in putatively neutral variation. * indicates statistical significance after Bonferroni correction.
Fig 3
Fig 3. Proportion of variants predicted to impact structural and functional mechanisms among variants from single residue non-frameshifting insertion/deletion variants.
A variant was considered “predicted” if its score was as high or higher than the 95-th percentile of the gnomAD score distribution. We contrast the functional impact of COSMIC, HGMD (n = 1556), de novo variants (n = 168). The highly recurrent set includes variants at residues impacted by at least 25 missense and insertion/deletion variants in the COSMIC database (n = 98), compared to recurrent variants which are impacted at least twice (n = 3622) and non-recurrent variants (n = 2417).
Fig 4
Fig 4. Proportion of COSMIC variants per histology type that impact structural and functional mechanisms compared to HGMD variants.
(A) Changes aggregated over each class of structural and functional mechanisms and (B) Proportions for a selection of individual mechanisms.
Fig 5
Fig 5. Receiver Operating Characteristic (ROC) curves and Areas Under the ROC Curves (AUC).
(A) Cross-validation performance of MutPred-Indel with per-protein and per-cluster training, as well as the performance of a model with training data that includes singleton variants in gnomAD. (B) Cross-validation performance of MutPred-Indel on insertions, deletions, and complex indel variants separately. (C) Performance of MutPred-Indel and MutPred2 on single amino acid insertion/deletion variants. (D) Comparison of MutPred-Indel and three existing methods.
Fig 6
Fig 6
Histogram of predicted pathogenicity scores for (A) the training data using cross-validation, (B) cancer driver mutations from dbCID (yellow), highly recurrent variants (COSMIC-R, red) compared to the background in COSMIC (blue), (C) de novo non-frameshifting insertion/deletion variants in individuals with autism spectrum disorder (ASD, red) and de novo variation from unaffected siblings (Control, blue).

Similar articles

Cited by

References

    1. Garcia-Diaz M, Kunkel TA. Mechanism of a genetic glissando: structural biology of indel mutations. Trends Biochem Sci. 2006;31(4):206–214. 10.1016/j.tibs.2006.02.004 - DOI - PubMed
    1. Montgomery SB, et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 2013;23(5):749–761. 10.1101/gr.148718.112 - DOI - PMC - PubMed
    1. Wang Z, Moult J. SNPs, protein structure, and disease. Hum Mutat. 2001;17(4):263–270. 10.1002/humu.22 - DOI - PubMed
    1. Guerois R, et al. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002;320(2):369–387. 10.1016/S0022-2836(02)00442-4 - DOI - PubMed
    1. Cheng J, et al. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins. 2006;62(4):1125–1132. 10.1002/prot.20810 - DOI - PubMed

Publication types