Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov 1;25(21):2744-50.
doi: 10.1093/bioinformatics/btp528. Epub 2009 Sep 3.

Automated inference of molecular mechanisms of disease from amino acid substitutions

Affiliations

Automated inference of molecular mechanisms of disease from amino acid substitutions

Biao Li et al. Bioinformatics. .

Abstract

Motivation: Advances in high-throughput genotyping and next generation sequencing have generated a vast amount of human genetic variation data. Single nucleotide substitutions within protein coding regions are of particular importance owing to their potential to give rise to amino acid substitutions that affect protein structure and function which may ultimately lead to a disease state. Over the last decade, a number of computational methods have been developed to predict whether such amino acid substitutions result in an altered phenotype. Although these methods are useful in practice, and accurate for their intended purpose, they are not well suited for providing probabilistic estimates of the underlying disease mechanism.

Results: We have developed a new computational model, MutPred, that is based upon protein sequence, and which models changes of structural features and functional sites between wild-type and mutant sequences. These changes, expressed as probabilities of gain or loss of structure and function, can provide insight into the specific molecular mechanism responsible for the disease state. MutPred also builds on the established SIFT method but offers improved classification accuracy with respect to human disease mutations. Given conservative thresholds on the predicted disruption of molecular function, we propose that MutPred can generate accurate and reliable hypotheses on the molecular basis of disease for approximately 11% of known inherited disease-causing mutations. We also note that the proportion of changes of functionally relevant residues in the sets of cancer-associated somatic mutations is higher than for the inherited lesions in the Human Gene Mutation Database which are instead predicted to be characterized by disruptions of protein structure.

Availability: http://mutdb.org/mutpred

Contact: predrag@indiana.edu; smooney@buckinstitute.org.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
ROC curves for Hgmd data set. (A) full curves and (B) curves in the false positive rate range of [0, 0.1]. The solid black curve represents the MutPred general score, the dashed gray curve represents SIFT, and the dotted line is the random model.
Fig. 2.
Fig. 2.
The percentage and number of amino acid substitutions for (A) functional properties and (B) structural properties, that represent actionable (dark gray) and confident (light gray) hypotheses on the molecular cause of disease on three data sets (SPd is omitted due to large overlap with Hgmd). White line indicates the number of mutations predicted to be influencing more than one functional or structural property.
Fig. 3.
Fig. 3.
The percentage of actionable hypotheses on Hgmd, Kinase, and Cancer data sets. P-values are calculated between Hgmd versus Kinase and Hgmd versus Cancer: gain of disorder (3.4×10−9; 2.7 ×10−22), loss of stability (1.0×10−15; 3.4×10−10), loss of post-translationally modified (PTM) target sites (3.8×10−4; 1.0×10−4).
Fig. 4.
Fig. 4.
Relative ranking of attributes across the Hgmd and Kinase (A) and Hgmd and Cancer (B) data sets. Gain and loss of structural and functional properties are represented by ×'s. SIFT is represented by a black triangle.

References

    1. Ahmad S, et al. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–486. - PubMed
    1. Bao L, Cui Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics. 2005;21:2185–2190. - PubMed
    1. Boeckmann B, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31:365–370. - PMC - PubMed
    1. Breiman L. Random forests. Mach. Learn. 2001;45:5–32.
    1. Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 2007;35:3823–3835. - PMC - PubMed

Publication types

LinkOut - more resources