Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2020 Sep 29;117(39):24117-24126.
doi: 10.1073/pnas.2007743117. Epub 2020 Sep 18.

Causal inference in genetic trio studies

Affiliations
Comparative Study

Causal inference in genetic trio studies

Stephen Bates et al. Proc Natl Acad Sci U S A. .

Abstract

We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.

Keywords: causal discovery; conditional independence testing; false discovery rate (FDR); family-based association test (FBAT); transmission disequilibrium test (TDT).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
A visualization of the process of recombination on a single chromosome.
Fig. 2.
Fig. 2.
A visualization of a digital twin. The gray shaded region represents the group g; the digital twin always matches the true offspring outside this region.
Fig. 3.
Fig. 3.
Results of the TDT in two populations. (Left and Center) Manhattan plots on chromosome 22, which contains the one true causal SNP, indicated with a dashed vertical line. The genome-wide significance threshold is shown with a gray horizontal line. Left panel shows an admixed population, whereas Center panel shows a British population. (Right) A plot of the absolute correlations between the causal SNP and the other SNPs, conditional on the parental haplotypes. The red solid and blue dotted-dashed curves indicate a smoothed 90% quantile of the absolute correlation with the causal SNP across the chromosome, for the admixed and British populations, respectively.
Fig. 4.
Fig. 4.
A graphical depiction of the causal argument in Causal Inference in the Trio Design. A shows that the random variable Z can create an association between Xg and Y, even if there is no causal effect. B shows that conditional on the parental haplotypes A, the external confounder Z is independent of the offspring’s genotype Xg. As a result, Z cannot be responsible for the remaining association between the genotype Xg and the trait Y. Note that in our hypothesis test we also condition on Xg, which is omitted from the figure for simplicity.
Fig. 5.
Fig. 5.
Power of the digital twin test compared to TDT benchmarks for testing the full-chromosome causal null.
Fig. 6.
Fig. 6.
Performance of the digital twin test and TDT in the binary-response full-genome simulations from Localization. Here, error bars give one SD and the dashed horizontal line indicates the nominal FDR level.
Fig. 7.
Fig. 7.
Performance of the digital twin test and TDT in an admixed population. The dashed horizontal line (Top row) indicates the nominal FDR level for the digital twin test. Because the TDT is using the genome-wide significance level, the nominal FDR level for the TDT is less than 0.05.

Comment in

  • Toward causality and improving external validity.
    Bühlmann P. Bühlmann P. Proc Natl Acad Sci U S A. 2020 Oct 20;117(42):25963-25965. doi: 10.1073/pnas.2018002117. Epub 2020 Oct 12. Proc Natl Acad Sci U S A. 2020. PMID: 33046646 Free PMC article. No abstract available.

References

    1. Visscher P. M., et al. , 10 years of GWAS discovery: Biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017). - PMC - PubMed
    1. Devlin B., Roeder K., Genomic control for association studies. Biometrics 55, 997–1004 (1999). - PubMed
    1. Cordell H. J., Clayton D. G., Genetic association studies. Lancet 366, 1121–1131 (2005). - PubMed
    1. Price A. L., et al. , Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). - PubMed
    1. Kang H. M., et al. , Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010). - PMC - PubMed

Publication types