Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 7;104(2):299-309.
doi: 10.1016/j.ajhg.2018.12.020. Epub 2019 Jan 24.

Improved Pathogenic Variant Localization via a Hierarchical Model of Sub-regional Intolerance

Affiliations

Improved Pathogenic Variant Localization via a Hierarchical Model of Sub-regional Intolerance

Tristan J Hayeck et al. Am J Hum Genet. .

Abstract

Different parts of a gene can be of differential importance to development and health. This regional heterogeneity is also apparent in the distribution of disease-associated mutations, which often cluster in particular regions of disease-associated genes. The ability to precisely estimate functionally important sub-regions of genes will be key in correctly deciphering relationships between genetic variation and disease. Previous methods have had some success using standing human variation to characterize this variability in importance by measuring sub-regional intolerance, i.e., the depletion in functional variation from expectation within a given region of a gene. However, the ability to precisely estimate local intolerance was restricted by the fact that only information within a given sub-region is used, leading to instability in local estimates, especially for small regions. We show that borrowing information across regions using a Bayesian hierarchical model stabilizes estimates, leading to lower variability and improved predictive utility. Specifically, our approach more effectively identifies regions enriched for ClinVar pathogenic variants. We also identify significant correlations between sub-region intolerance and the distribution of pathogenic variation in disease-associated genes, with AUCs for classifying de novo missense variants in Online Mendelian Inheritance in Man (OMIM) genes of up to 0.86 using exonic sub-regions and 0.91 using sub-regions defined by protein domains. This result immediately suggests that considering the intolerance of regions in which variants are found may improve diagnostic interpretation. We also illustrate the utility of integrating regional intolerance into gene-level disease association tests with a study of known disease-associated genes for epileptic encephalopathy.

Keywords: LIMBR; RVIS; conservation; constraint; domains; exons; genic sub-region; intolerance; negative selection; pathogenic.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relative Missense Variation versus Total Variation across Domains Regions (domains) are plotted with the number of total SNV variants in each coding region on the x axis versus the number of missense variants on the y axis. The genome-wide average missense variation versus total variation is plotted as a black solid line (A). Highlighted in blue are the SCN1A domains as an example. The offset average gene level trend for SCN1A is plotted as a blue dotted line (B) and can be seen more clearly in the exploded panel. Fitting a Bayesian hierarchical model allows for sharing of information across sub-regions, pulling the sub-region level terms toward the genic average.
Figure 2
Figure 2
Comparing Sub-RVIS versus LIMBR Length Distributions Looking at the Top and Bottom 10% Intolerant Sub-regions The histograms show the distribution of the number of bases spanned in sub-regions in top and bottom 10% of intolerance scores sets for (A) sub-RVIS exon scores, (B) LIMBR exon scores, (C) sub-RVIS domain scores, and (D) LIMBR domain scores.
Figure 3
Figure 3
Comparing LIMBR and Sub-RVIS Ability to Capture ClinVar Pathogenic Variants versus Percent of Bases Spanned, Restricting to OMIM Genes The genes are either broken up into exons (blue) or domains (black), then sub-regions are sorted based on either their sub-RVIS (dotted lines) or LIMBR scores (solid lines). At each percentile, the percentage of ClinVar pathogenic variants captured relative to the percent of bases covered is compared between the two methods.
Figure 4
Figure 4
Performance of Different Methods’ Ability to Capture De Novo Pathogenicity across Exons Relative to Benign Variation (A–C) The methods are compared looking first at exons and the percent de novo missense variants versus benign exons restricting to (A) all OMIM genes, (B) epilepsy gene set, and (C) neurodevelopmental autosomal-dominant genes. (D–F) For domain groupings, the methods are compared again with the percent de novo missense variants versus benign exons restricting to (D) all OMIM genes, (E) epilepsy gene set, and (F) neurodevelopmental autosomal-dominant genes.
Figure 5
Figure 5
Performance of Different Methods’ Ability to Capture Pathogenicity across Exons Relative to Benign Variation (A–C) LIMBR is compared against the other methods to see how well it classifies pathogenic exons versus benign exons restricting to (A) all OMIM, (B) epilepsy gene set, or (C) neurodevelopmental autosomal-dominant gene set. (D–F) Then the percent pathogenic variants versus control variants captured by the different methods is compared restricting to (D) all OMIM, (E) epilepsy gene set, or (F) neurodevelopmental autosomal-dominant gene set.
Figure 6
Figure 6
Performance of LIMBR across Different Gene Sets The LIMBR classification (A) plotting exons with at least one pathogenic variant versus benign exons restricting to different OMIM genes sets (overlapping with set in Figure 4). Then similarly using to LIMBR percentile rankings of exons to see the (B) percent pathogenic de novo variants relative to benign exons again restricting to different OMIM genes sets.
Figure 7
Figure 7
Localized Genic Intolerance to Variation in Key Epileptic Encephalopathy Genes (A–E) The plots above are of the intolerance scores for (A) SCN1A, (B) SC81A, (C) CDKL5, (D) PCDH19, and (E) KCNT1 with 95% credibility in gray across combined coding positions in all transcripts. The bar strip below that plot indicates when the start and end of an exon occurs. Below are the densities of ClinVar variants matched up at the corresponding genomic positions with the intolerance scores. (F) Table that depicts a group-wise association test, both unweighted and weighted with the inverse percentile of the LIMBR intolerance, using a cohort of 488 epileptic encephalopathy case subjects and 12,151 unrelated control subjects from a previous rare variant collapsing analysis.

References

    1. Goldstein D.B., Allen A., Keebler J., Margulies E.H., Petrou S., Petrovski S., Sunyaev S. Sequencing studies in human genetics: design and interpretation. Nat. Rev. Genet. 2013;14:460–470. - PMC - PubMed
    1. Eilbeck K., Quinlan A., Yandell M. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 2017;18:599–612. - PMC - PubMed
    1. Davydov E.V., Goode D.L., Sirota M., Cooper G.M., Sidow A., Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++ PLoS Comput. Biol. 2010;6:e1001025. - PMC - PubMed
    1. Petrovski S., Wang Q., Heinzen E.L., Allen A.S., Goldstein D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9:e1003709. - PMC - PubMed
    1. Samocha K.E., Robinson E.B., Sanders S.J., Stevens C., Sabo A., McGrath L.M., Kosmicki J.A., Rehnström K., Mallick S., Kirby A. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 2014;46:944–950. - PMC - PubMed

Publication types

Supplementary concepts

LinkOut - more resources