Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;19(10):1151-1158.
doi: 10.1038/gim.2017.26. Epub 2017 May 18.

Using high-resolution variant frequencies to empower clinical genome interpretation

Affiliations

Using high-resolution variant frequencies to empower clinical genome interpretation

Nicola Whiffin et al. Genet Med. 2017 Oct.

Abstract

PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Plot of Exome Aggregation Consortium (ExAC) allele count (all populations) against case allele count for variants classified as variants of unknown significance (VUS), likely pathogenic, or pathogenic in 6,179 cases of hypertrophic cardiomyopathy. The dotted lines represent the maximum tolerated ExAC allele counts in hypertrophic cardiomyopathy for 50% (dark blue) and 100% (light blue) penetrance. Variants are color-coded according to reported pathogenicity. Where classifications from contributing laboratories were discordant, the more conservative classification is plotted. The inset panel shows the full dataset; the main panel expands the region of primary interest. True pathogenic variants appropriately fall below our derived allele count threshold.
Figure 2
Figure 2
A flow diagram of our approach, applied to a dominant condition, and using Exome Aggregation Consortium (ExAC) as our reference sample. First, a disease-level maximum credible population allele frequency (AF) is calculated, based on disease prevalence, heterogeneity, and penetrance. To evaluate a specific variant, we determine whether the observed variant allele count is compatible with disease by comparing this maximum credible population AF against the (precalculated) filtering AF for the variant. *While filtering AF has been precomputed for ExAC variants, the same framework can be readily applied using another reference sample.
Figure 3
Figure 3
The clinical utility of stringent allele frequency (AF) thresholds. (a) The number of predicted protein-altering variants (definition in “Materials and Methods”) per exome as a function of the AF filter applied. A one-tailed 95% confidence interval is used, meaning that variants were removed from consideration if their AC would fall within the top 5% of the Poisson probability distribution for the user’s maximum credible AF (x axis). (b) The odds ratio for HCM disease-association against AF. The disease odds ratio of a burden test for variants in HCM genes is shown, stratified by variant allele frequency. For each AF bin, the prevalence of variants in sarcomeric HCM-associated genes (MYH7, MYBPC3, TNNT2, TNNI3, MYL2, MYL3, TPM1, and ACTC1, analyzed collectively) in 322 HCM cases and 852 healthy controls was compared, and an odds ratio computed (see “Materials and Methods”). Data for each bin is plotted at the upper AF cutoff. Error bars represent 95% confidence intervals. The probability that a variant is pathogenic is much greater at very low AFs.

References

    1. Chong JX, Buckingham KJ, Jhangiani SN et al, The genetic basis of Mendelian phenotypes: Discoveries, challenges, and opportunities. Am JHum Genet 2015;97:199–215. - PMC - PubMed
    1. Lek M, Karczewski KJ, Minikel EV et al, Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016;536:285–291. - PMC - PubMed
    1. Richards S, Aziz N, Bale S et al, Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405–423. - PMC - PubMed
    1. MacArthur DG, Manolio TA, Dimmock DP et al, Guidelines for investigating causality of sequence variants in human disease. Nature 2014;508:469–476. - PMC - PubMed
    1. Bamshad MJ, Ng SB, Bigham AW et al, Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 2011;12:745–755. - PubMed

Publication types