Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 11;10(12):e1004729.
doi: 10.1371/journal.pgen.1004729. eCollection 2014 Dec.

Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism

Affiliations

Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism

Iuliana Ionita-Laza et al. PLoS Genet. .

Abstract

Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. (a) Median rank of causal variants among the non-synonymous variants for two disease models (M1 and M2) and three values for the NS:S ratio (R = {0.6, 1.0, 1.4}).
The proportion of causal variants in the region is 20%. HM refers to the original hierarchical model with ranking of the causal variants among the non-synonymous variants, based on their estimated effects; BE refers to the backward elimination procedure for non-synonymous variants; and HMS refers to the ranking of causal variants only among those non-synonymous variants selected by the backward elimination procedure, with ranks based on the estimated effects from the hierarchical model. (b) The number of causal variants in Top 10 for non-synonymous variants.
Figure 2
Figure 2. The effect of multiple bioinformatic predictors for non-synonymous variants.
Ranking is done only within the set of variants selected by the backward elimination procedure. (a) Median rank of causal variants for two disease models (M1 and M2) and three values for the NS:S ratio (R = {0.6, 1.0, 1.4}). The proportion of causal variants in the region is 20%. HMS refers to the hierarchical model with ranking of the causal variants among the selected non-synonymous variants, based on their estimated effects, B1 refers to the hierarchical model with one bioinformatic predictor (B1, Table 2), B2 refers to the hierarchical model with one bioinformatic predictor (B2), mB1 refers to the hierarchical model with three bioinformatic predictors (B1, B1, and B2), and mB2 refers to the hierarchical model with four bioinformatic predictors (four B1s). (b) The number of causal variants in Top 10 for non-synonymous variants.
Figure 3
Figure 3. Predicted deleteriousness scores are shown for 71 rare functional variants (non-synonymous, nonsense and splice-sites).
From the top, the first plot depicts the PolyPhen-2 score for each variant, the second depicts the GERP_RS score, and the third depicts variant counts for cases (up) and controls (down). Green tick marks indicate a variant contained in an exon, and red ticks indicate that a variant is selected by the backward elimination procedure. LoF variants are marked by a black asterisk; the homozygous probably damaging variant is marked by a red asterisk. The location of five protein domains (ChoreinN, TM2, TM4, DUF1162, Golgi targeting element, and ATG C) are depicted by boxes at the top of the plot (see Figure S10 for a complete view of VPS13B protein domains). Variants are plotted equidistantly on the x-axis.
Figure 4
Figure 4. Results from the backward elimination procedure for non-synonymous and splice site variants in VPS13B.
(a) The change in p value is shown as variants are being removed one by one (when the backward elimination procedure is run once on all non-synonymous variants). (b) Distribution of return counts for non-synonymous and splice site variants in VPS13B; overlaid is a fitted mixture with two components.

References

    1. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24: 133–141. - PubMed
    1. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11: 31–46. - PubMed
    1. Zhang J, Chiodini R, Badr A, Zhang G (2011) The impact of next-generation sequencing on genomics. J Genet Genomics 38: 95–109. - PMC - PubMed
    1. Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104. - PMC - PubMed
    1. Ionita-Laza I, Cho MH, Laird NM (2013) Statistical Challenges in Sequence-Based Association Studies with Population-and Family-Based Designs. Statistics in Biosciences 5: 54–70.

Publication types

MeSH terms

Supplementary concepts