Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct;210(2):463-476.
doi: 10.1534/genetics.118.301266. Epub 2018 Aug 13.

Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method

Affiliations

Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method

Ming Li et al. Genetics. 2018 Oct.

Abstract

The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.

Keywords: alcohol dependence; family-based association study; genetic heterogeneity; population stratification; rare variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of the minor allele frequencies of 10,527 variants from the 1000 Genome Project (Chromosome 17: 7344328–8344327; minor allele frequency of ≤ 5%.
Figure 2
Figure 2
Family structures used in the simulations. Left: a nuclear family with four members. Right: a three-generation family with eight members.
Figure 3
Figure 3
Simulation S1: Statistical power of all methods when there is no genetic heterogenerity. QT: Quantitative Trait; BT: Binary Trait. 1-D: Effect of causal variants is unidirectional; 2-D: Effect of causal variants is bidirectional. Black: FGRF-O; Red: FGRF-B; Green: FGRF-W; Blue: FGRF-F; Cyan: GSKAT; Magenta: Burden test.
Figure 4
Figure 4
Simulation S2: statistical power of all methods when genetic heterogenerity is caused by rare but not private mutations. QT, Quantitative Trait; BT, Binary Trait; 1-D: Effect of causal variants is unidirectional; 2-D: Effect of causal variants is bidirectional. Black: FGRF-O; Red: FGRF-B; Green: FGRF-W; Blue: FGRF-F; Cyan: GSKAT; Magenta: Burden test.
Figure 5
Figure 5
Simulation S3: statistical power of all methods when genetic heterogenerity is caused by private mutations. QT, Quantitative Trait; BT, Binary Trait. 1-D: Effect of causal variants is unidirectional; 2-D: Effect of causal variants is bidirectional. Black: FGRF-O; Red: FGRF-B; Green: FGRF-W; Blue: FGRF-F; Cyan: GSKAT; Magenta: Burden test.
Figure 6
Figure 6
Distribution of AD phenotype.
Figure 7
Figure 7
Q–Q plots of P-values (logarithm scale) for gene-based association tests by using each statistical method. λ: Genomic inflation factor.

Similar articles

Cited by

References

    1. Adler R. J., Taylor J. E., 2007. Random Field and Geometry. Springer, New York.
    1. Argani P., Iacobuzio-Donahue C., Ryu B., Rosty C., Goggins M., et al. , 2001. Mesothelin is overexpressed in the vast majority of ductal adenocarcinomas of the pancreas: identification of a new pancreatic cancer marker by serial analysis of gene expression (SAGE). Clin. Cancer Res. 7: 3862–3868. - PubMed
    1. Berg K. A., Astemborski J. A., Boughman J. A., Ferencz C., 1989. Congenital cardiovascular malformations in twins and triplets from a population-based study. Am. J. Dis. Child. 143: 1461–1463. - PubMed
    1. Boos D. D., 1992. On generalized score tests. Am. Stat. 46: 327–333.
    1. Chen H., Meigs J. B., Dupuis J., 2013. Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 37: 196–204. 10.1002/gepi.21703 - DOI - PMC - PubMed

Publication types