Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 6;101(1):104-114.
doi: 10.1016/j.ajhg.2017.05.015. Epub 2017 Jun 29.

A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases

Affiliations

A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases

Daniel Greene et al. Am J Hum Genet. .

Abstract

We present a rapid and powerful inference procedure for identifying loci associated with rare hereditary disorders using Bayesian model comparison. Under a baseline model, disease risk is fixed across all individuals in a study. Under an association model, disease risk depends on a latent bipartition of rare variants into pathogenic and non-pathogenic variants, the number of pathogenic alleles that each individual carries, and the mode of inheritance. A parameter indicating presence of an association and the parameters representing the pathogenicity of each variant and the mode of inheritance can be inferred in a Bayesian framework. Variant-specific prior information derived from allele frequency databases, consequence prediction algorithms, or genomic datasets can be integrated into the inference. Association models can be fitted to different subsets of variants in a locus and compared using a model selection procedure. This procedure can improve inference if only a particular class of variants confers disease risk and can suggest particular disease etiologies related to that class. We show that our method, called BeviMed, is more powerful and informative than existing rare variant association methods in the context of dominant and recessive disorders. The high computational efficiency of our algorithm makes it feasible to test for associations in the large non-coding fraction of the genome. We have applied BeviMed to whole-genome sequencing data from 6,586 individuals with diverse rare diseases. We show that it can identify multiple loci involved in rare diseases, while correctly inferring the modes of inheritance, the likely pathogenic variants, and the variant classes responsible.

Keywords: Bayesian inference; Mendelian diseases; hereditary disorders; rare diseases; rare variant association test; rare variants; whole-genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation Study (A and B) Results of the simulation study. Mean PPV at power of 80% over repeat simulation of the BeviMed, SKAT, ADA, and CAST rare variant association tests for data simulated using the expression in Equation 1 for various combinations of values of τ and π. (C) Receiver operating characteristic (ROC) curves for the classification of variants as pathogenic by BeviMed for different values of π. (D) Left: mean PPV at power of 80% for BeviMed and SKAT at τ = 0.2 and π = 0.85, for varying proportions of pathogenic and non-pathogenic variants being up-weighted in the co-data variables. Right: posterior mean of ϕ corresponding to the applications of BeviMed on the left-hand grid. (E) Mean PPV at power of 80% over repeat simulation of the BeviMed and SKAT association tests for different values of k.
Figure 2
Figure 2
Posterior Probability of Pathogenicity for Rare Variants in ANKRD26 Results obtained by applying our inference procedure to rare allele counts in ANKRD26 against the thrombocytopenia case/control label. Exons are represented by gray blocks starting from the 5′ UTR on the left and ending with the 3′ UTR on the right. The classes that each variant belongs to are indicated by crosses. The bar chart in the top right shows the posterior probability of each association model under each mode of inheritance conditional on an association being present at the locus. The gray bars above show the marginal posterior probabilities of pathogenicity for individual rare variants conditional on an association being present at the locus. The inference algorithm was run with 100,000 iterations instead of the usual 1,000 in order to reduce jitter due to Monte Carlo sampling error. The bar chart beneath shows the breakdown of heterozygous and homozygous carriers of the variants between case and control subjects.
Figure 3
Figure 3
Posterior Probability of Pathogenicity for Rare Variants in RNU4ATAC Results of applying the inference procedure to rare allele counts in RNU4ATAC against the Roifman syndrome case label. The bar chart on the right shows the marginal posterior probabilities of pathogenicity for each rare variant conditional on an association being present at the locus. The inference algorithm was run with 100,000 iterations instead of the usual 1,000 in order to reduce jitter due to Monte Carlo sampling error. The bar chart on the left shows the breakdown of heterozygous and homozygous carriers of the variants in case and control subjects. Compound heterozygous individuals with two rare alleles in RNU4ATAC were observed, and for each such individual a line is drawn linking the two variants.

References

    1. Marx V. The DNA of a nation. Nature. 2015;524:503–505. - PubMed
    1. Wu M.C., Lee S., Cai T., Li Y., Boehnke M., Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011;89:82–93. - PMC - PubMed
    1. Morgenthaler S., Thilly W.G. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST) Mutat. Res. 2007;615:28–56. - PubMed
    1. Ionita-Laza I., Capanu M., De Rubeis S., McCallum K., Buxbaum J.D. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10:e1004729. - PMC - PubMed
    1. Lin W.-Y. Adaptive combination of P-values for family-based association testing with sequence data. PLoS ONE. 2014;9:e115971. - PMC - PubMed

MeSH terms

Substances

Supplementary concepts