Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(1):e1003143.
doi: 10.1371/journal.pgen.1003143. Epub 2013 Jan 17.

Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies

Affiliations

Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies

Miao-Xin Li et al. PLoS Genet. 2013.

Abstract

Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ~22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. ROC and PR curves of prediction methods evaluated on the ExoVar dataset using a 10-fold cross-validation.
(a) ROC and (b) PR. AUC is shown next to the name of each method.
Figure 2
Figure 2. ROC and PR curves of combining a subset of the five individual methods in a logit model evaluated on the ExoVar dataset using a 10-fold cross-validation.
(a) ROC and (b) PR. AUC is shown next to the name of each method.
Figure 3
Figure 3. ROC and PR curves of prediction methods evaluated on the HumVar dataset using a 10-fold cross-validation.
(a) ROC and (b) PR. AUC is shown next to the name of each method.
Figure 4
Figure 4. ROC and PR curves of prediction methods evaluated on the DomRec dataset using a 3-fold cross-validation.
(a) ROC and (b) PR. AUC is shown next to the name of each method.
Figure 5
Figure 5. The relationship between prior and posterior probabilities of a rare nsSNV being pathogenic, given the prediction scores from SIFT, PolyPhen2, and MutationTaster.
The white dashed lines indicate the estimated range of the prior (5%). We assume that there is no difference in prediction scores from the three methods for the same variant. The α, β SIFT, β Polyphen2 and β MutationTaster in a selected sample evaluated in the ExoVar dataset are used in the calculation of posteriors (See Eq. 2 and 3 in Materials and Methods) and take the values of −3.53, 1.64, 1.48, and 2.47 respectively. The prior and posterior are equivalent to the quantity P disease in an individual genome in Eq. 3 and P(Y = 1|X) in Eq. 2 respectively.

Similar articles

Cited by

References

    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461: 272–U153. - PMC - PubMed
    1. Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, et al. (2009) The Human Gene Mutation Database: 2008 update. Genome Med 1: 13. - PMC - PubMed
    1. Li MX, Gui HS, Kwan JS, Bao SY, Sham PC (2012) A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res 40: e53. - PMC - PubMed
    1. Ge D, Ruzzo EK, Shianna KV, He M, Pelak K, et al. (2011) SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27: 1998–2000. - PMC - PubMed
    1. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38: e164. - PMC - PubMed

Publication types

LinkOut - more resources