Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jul;46(5-6):266-284.
doi: 10.1002/gepi.22453. Epub 2022 Apr 22.

Benchmarking statistical methods for analyzing parent-child dyads in genetic association studies

Affiliations
Review

Benchmarking statistical methods for analyzing parent-child dyads in genetic association studies

Debashree Ray et al. Genet Epidemiol. 2022 Jul.

Abstract

Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear family consisting of two parents and their affected child. This trio design is particularly advantageous for studying relatively rare disorders because it is less prone to type 1 error inflation due to population stratification compared to population-based study designs (e.g., case-control studies). However, obtaining genetic data from both parents is difficult, from a practical perspective, and many large studies predominantly measure genetic variants in mother-child dyads. While some statistical methods for analyzing parent-child dyad data (most commonly involving mother-child pairs) exist, it is not clear if they provide the same advantage as trio methods in protecting against population stratification, or if a specific dyad design (e.g., case-mother dyads vs. case-mother/control-mother dyads) is more advantageous. In this article, we review existing statistical methods for analyzing genome-wide marker data on dyads and perform extensive simulation experiments to benchmark their type I errors and statistical power under different scenarios. We extend our evaluation to existing methods for analyzing a combination of case-parent trios and dyads together. We apply these methods on genotyped and imputed data from multiethnic mother-child pairs only, case-parent trios only or combinations of both dyads and trios from the Gene, Environment Association Studies consortium (GENEVA), where each family was ascertained through a child affected by nonsyndromic cleft lip with or without cleft palate. Results from the GENEVA study corroborate the findings from our simulation experiments. Finally, we provide recommendations for using statistical genetic association methods for dyads.

Keywords: dyads; family-based GWAS; hybrid design; log-linear models; mother-child pairs; parent-offspring design; transmission disequilibrium; trios.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Type I error performance of the different combinations of methods and nuclear‐family designs at stringent significance levels. Results are based on simulated data on 1,000 families from either one or two homogenous racial/ethnic groups with 1 million null SNPs. For one homogenous sample, a common disease prevalence of 30% and MAF 10% was simulated. For biethnic data, the second homogenous group had a disease prevalence of 15% and MAF 3%. All offspring were affected, and all parents were unaffected. Observed(−log10 p values) are plotted on the y‐axis and Expected(−log10 p values) on the x‐axis of these QQ plots. The gray shaded region in each QQ plot represents a conservative 95% confidence interval for the expected distribution of p values. GDT, generalized disequilibrium test; gTDT, genotypic TDT; MAF, minor allele frequency; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test
Figure 2
Figure 2
Type I error performance of the different combinations of methods and hybrid‐family designs at stringent significance levels. Results are based on simulated data on 1,000 families from either one or two homogenous racial/ethnic groups with 1 million null SNPs. For one homogenous group, a common disease prevalence of 30% and MAF 10% was simulated. For biethnic data, the second homogenous group had a disease prevalence of 15% and MAF 3%. Case‐to‐control ratio among offspring was 50:50, and all parents were unaffected. Observed (−log10 p values) are plotted on the y‐axis and Expected (−log10 p values) on the x‐axis of these QQ plots. The gray shaded region in each QQ plot represents a conservative 95% confidence interval for the expected distribution of p values. GDT, generalized disequilibrium test; gTDT, genotypic TDT; MAF, minor allele frequency; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test
Figure 3
Figure 3
Statistical power for the different combinations of methods and nuclear‐family designs at genome‐wide significance level (5×108). Results are based on simulated data on 1,000 families from one homogenous racial/ethnic group with 10,000 nonnull SNPs at MAF 10% at the casual SNP, and a common disease prevalence of 30%. Results for data simulated using the recessive inheritance model are not shown due to nearly zero power of these methods at the chosen parameter values. All offspring were affected, and all parents were unaffected. (a) Comparison of designs with the same number of families of different compositions. (b) Comparison of the combined analysis of 750 case–mother dyads and 250 case–parent trios against the scenarios when either all dyads or all trios are removed from analysis. GDT, generalized disequilibrium test; gTDT, genotypic TDT; MAF, minor allele frequency; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test
Figure 4
Figure 4
Statistical power for different combinations of methods and hybrid‐family designs at genome‐wide significance level (5×108). Results are based on simulated data on 1,000 families from one homogenous racial/ethnic group with 10,000 nonnull SNPs at MAF 10% at the causal SNP, and a common disease prevalence of 30%. Results for data simulated using the recessive inheritance model are not shown due to nearly zero power of these methods at the chosen parameter values. Case‐to‐control ratio among offspring is either 70:30, 50:50 or 30:70. All parents are unaffected. GDT, generalized disequilibrium test; gTDT, genotypic TDT; MAF, minor allele frequency; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test
Figure 5
Figure 5
Manhattan plots for the different combinations of methods and nuclear family designs from the multiethnic GENEVA study on CL/P. The gTDT and the TDT are applicable to case–parent trio design (n=1487) only. The 1‐TDT (a generalization of TDT) is applicable to both case–mother dyad design (n=1487) and the combined case–mother dyad case–parent trio design (n1=371 trios, n2=1116 dyads). The GDT‐PO and HAPLIN methods are applicable to all three designs. Here, HAPLIN (2‐df test of offspring genotypic effect) was applied on each racial/ethnic group separately and then meta‐analyzed. The red and blue horizontal lines in each plot correspond to the genome‐wide (5×108) and a suggestive (106) significance levels, respectively. The genome‐wide significant loci for each method‐design pair are annotated in dark gray and the suggestively significant loci in light gray. The gene names provided are labels for the genetic loci based on nearest gene mapping approach and do not necessarily represent causal genes. GDT, generalized disequilibrium test; gTDT, genotypic TDT; TDT, transmission disequilibrium test
Figure 6
Figure 6
Compute times for the different combinations of methods and nuclear family designs from the multiethnic GENEVA study on CL/P. Results are based on a subset of the genetic data: 8,015 genotyped/imputed SNPs in the region chr8:128344410‐132105518 that includes the known cleft locus 8q24. GDT, generalized disequilibrium test; gTDT, genotypic TDT; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test

References

    1. Ainsworth, H. F. , Unwin, J. , Jamison, D. L. , & Cordell, H. J. (2011). Investigation of maternal effects, maternal‐fetal interactions and parent‐of‐origin effects (imprinting), using mothers and their offspring. Genetic Epidemiology, 35, 19–45. - PMC - PubMed
    1. Beaty, T. H. , Marazita, M. L. , & Leslie, E. J. (2016). Genetic factors influencing risk to orofacial clefts: Today's challenges and tomorrow's opportunities. F1000Research, 5, 2800. - PMC - PubMed
    1. Beaty, T. H. , Murray, J. C. , Marazita, M. L. , Munger, R. G. , Ruczinski, I. , Hetmanski, J. B. , Liang, K. Y. , Wu, T. , Murray, T. , Fallin, M. D. , Redett, R. A. , Raymond, G. , Schwender, H. , Jin, S. C. , Cooper, M. E. , Dunnwald, M. , Mansilla, M. A. , Leslie, E. , Bullard, S. , … Scott, A. F. (2010). A genome‐wide association study of cleft lip with and without cleft palate identifies risk variants near MAFB and ABCA4. Nature Genetics, 42, 525–529. - PMC - PubMed
    1. Benyamin, B. , Visscher, P. M. , & McRae, A. F. (2009). Family‐based genome‐wide association studies. Pharmacogenomics, 10, 181–190. - PubMed
    1. Chang, C. C. , Chow, C. C. , Tellier, L. C. , Vattikuti, S. , Purcell, S. M. , & Lee, J. J. (2015). Second‐generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4(7), 8. eCollection 2015. - PMC - PubMed

Publication types