Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 18:6:25.
doi: 10.1186/1745-6150-6-25.

Causal graph-based analysis of genome-wide association data in rheumatoid arthritis

Affiliations

Causal graph-based analysis of genome-wide association data in rheumatoid arthritis

Alexander V Alekseyenko et al. Biol Direct. .

Abstract

Background: GWAS owe their popularity to the expectation that they will make a major impact on diagnosis, prognosis and management of disease by uncovering genetics underlying clinical phenotypes. The dominant paradigm in GWAS data analysis so far consists of extensive reliance on methods that emphasize contribution of individual SNPs to statistical association with phenotypes. Multivariate methods, however, can extract more information by considering associations of multiple SNPs simultaneously. Recent advances in other genomics domains pinpoint multivariate causal graph-based inference as a promising principled analysis framework for high-throughput data. Designed to discover biomarkers in the local causal pathway of the phenotype, these methods lead to accurate and highly parsimonious multivariate predictive models. In this paper, we investigate the applicability of causal graph-based method TIE* to analysis of GWAS data. To test the utility of TIE*, we focus on anti-CCP positive rheumatoid arthritis (RA) GWAS datasets, where there is a general consensus in the community about the major genetic determinants of the disease.

Results: Application of TIE* to the North American Rheumatoid Arthritis Cohort (NARAC) GWAS data results in six SNPs, mostly from the MHC locus. Using these SNPs we develop two predictive models that can classify cases and disease-free controls with an accuracy of 0.81 area under the ROC curve, as verified in independent testing data from the same cohort. The predictive performance of these models generalizes reasonably well to Swedish subjects from the closely related but not identical Epidemiological Investigation of Rheumatoid Arthritis (EIRA) cohort with 0.71-0.78 area under the ROC curve. Moreover, the SNPs identified by the TIE* method render many other previously known SNP associations conditionally independent of the phenotype.

Conclusions: Our experiments demonstrate that application of TIE* captures maximum amount of genetic information about RA in the data and recapitulates the major consensus findings about the genetic factors of this disease. In addition, TIE* yields reproducible markers and signatures of RA. This suggests that principled multivariate causal and predictive framework for GWAS analysis empowers the community with a new tool for high-quality and more efficient discovery.

Reviewers: This article was reviewed by Prof. Anthony Almudevar, Dr. Eugene V. Koonin, and Prof. Marianthi Markatou.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical representation of the local pathway concept. The local pathway of the phenotype (shown with the ash blue colour) contains all its direct causes (C1, C2, C3), direct effects (E1, E2, E3), and direct causes of the direct effects (CE1). This is exactly the Markov boundary of the phenotype. Other variables (X1, X2, X3, X4, X5) do not belong to the local pathway. This definition of a local pathway ties in a theoretically rigorous manner causality with predictivity, since the Markov boundary is the smallest set of variables that contains the maximum predictive information about the phenotype that is contained in the data. Alternative definitions of the local causal pathway that exclude direct causes of the direct effects (the so-called "spouse variables", such as CE1) are also useful and specialized algorithms exist to infer them from data. In GWAS data, the two definitions coincide because of lack of spouse variables in GWAS designs.
Figure 2
Figure 2
ROC curves for the two causal graph-based predictive models applied to NARAC testing set. Model denoted with "MB1" was fit using five SNPs from the first Markov boundary; model denoted with "MB2" was fit using five SNPs from the second Markov boundary.
Figure 3
Figure 3
Area under the ROC curve (AUC) for the causal graph-based predictive models developed in 1000 different random splits of NARAC data into training and testing sets.
Figure 4
Figure 4
Previously known SNP associations become statistically independent of the phenotype conditioned on 4 SNPs discovered by TIE*. The phenotypic response variable is shown with black circle in the middle ("RA") and SNPs are shown with white ovals. SNPs that have a univariate association with the phenotype (according to G2 test at significance level α = 5%) have a path to "RA". SNPs that become statistically independent of the phenotype given a subset of 4 SNPs found by TIE* (so-called "conditioning set") are connected with "RA" by indirect paths that go through SNPs in the corresponding conditioning set.
Figure 5
Figure 5
ROC curves for validation of the causal graph-based predictive model of rheumatoid arthritis (that was developed in NARAC training set) in EIRA cohort.
Figure 6
Figure 6
ROC curves for validation of the modified causal graph-based predictive model of rheumatoid arthritis (without SNP rs12523624) in EIRA cohort.

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009;106:9362–9367. - PMC - PubMed
    1. Chang M, Rowland CM, Garcia VE, Schrodi SJ, Catanese JJ, van der Helm-van Mil AH, Ardlie KG, Amos CI, Criswell LA, Kastner DL, Gregersen PK, Kurreeman FA, Toes RE, Huizinga TW, Seldin MF, Begovich AB. A large-scale rheumatoid arthritis genetic study identifies association at chromosome 9q33.2. PLoS Genet. 2008;4:e1000107. - PMC - PubMed
    1. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK. TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. N Engl J Med. 2007;357:1199–1209. - PMC - PubMed
    1. Raychaudhuri S, Remmers EF, Lee AT, Hackett R, Guiducci C, Burtt NP, Gianniny L, Korman BD, Padyukov L, Kurreeman FA, Chang M, Catanese JJ, Ding B, Wong S, van der Helm-van Mil AH, Neale BM, Coblyn J, Cui J, Tak PP, Wolbink GJ, Crusius JB, van der Horst-Bruinsma IE, Criswell LA, Amos CI, Seldin MF, Kastner DL, Ardlie KG, Alfredsson L, Costenbader KH, Altshuler D, Huizinga TW, Shadick NA, Weinblatt ME, de VN, Worthington J, Seielstad M, Toes RE, Karlson EW, Begovich AB, Klareskog L, Gregersen PK, Daly MJ, Plenge RM. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat Genet. 2008;40:1216–1223. - PMC - PubMed
    1. Coenen MJ, Gregersen PK. Rheumatoid arthritis: a view of the current genetic landscape. Genes Immun. 2009;10:101–111. - PMC - PubMed

Publication types