Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2013 Jun 6;92(6):1008-12.
doi: 10.1016/j.ajhg.2013.05.002. Epub 2013 May 23.

Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease

Affiliations
Multicenter Study

Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease

Zhi Wei et al. Am J Hum Genet. .

Abstract

We performed risk assessment for Crohn's disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), by using data from the International IBD Genetics Consortium's Immunochip project. This data set contains ~17,000 CD cases, ~13,000 UC cases, and ~22,000 controls from 15 European countries typed on the Immunochip. This custom chip provides a more comprehensive catalog of the most promising candidate variants by picking up the remaining common variants and certain rare variants that were missed in the first generation of GWAS. Given this unprecedented large sample size and wide variant spectrum, we employed the most recent machine-learning techniques to build optimal predictive models. Our final predictive models achieved areas under the curve (AUCs) of 0.86 and 0.83 for CD and UC, respectively, in an independent evaluation. To our knowledge, this is the best prediction performance ever reported for CD and UC to date.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Ten-Fold Cross-Validation for Model Selection and Training SNPs that survived fold 1 preselection may still contain noisy predictors. We employed L1-penalized logistic regression to further remove irrelevant SNPs while fitting a predictive model using fold 2 data. The larger the penalty parameter lambda, the more SNPs were removed. The numbers on the top of the plot are the corresponding numbers of SNPs survived under different values of lambda shown along the x axis. We selected lambda by using 10-fold cross validation. Specifically, we calculated the average AUC for different values of lambda and took the largest value yielding the most parsimonious model such that AUC is within 1 SE of the optimum (the two vertical dashed lines). The optimal 10-fold cross-validated AUCs on fold 2 data were 0.864 and 0.830 for (A) CD and (B) UC, respectively.
Figure 2
Figure 2
Contribution of Sample Size and Predictors For all experiments, we trained logistic regression models on fold 2 data and plotted AUCs of testing on fold 3 data. (A) 10% sample sizes of fold 2 data were 1,327 and 1,197 for CD and UC, respectively. (B) UC/CD: AUCs were achieved by using only the 30 CD-specific loci or the 23 UC-specific loci; UC/CD + IBD: AUCs were achieved by using the UC or CD loci plus the 110 IBD loci; UC + CD + IBD: AUCs were achieved by using all the 163 IBD loci; Affy500K: AUCs were achieved by using the 1,201/724 CD/UC Immunochip SNPs that are also typed on the Affymetrix 500K chip; Illumina550K: AUCs were achieved by using 1,728/1,142 CD/UC Immunochip SNPs that are also typed on the Illumina 550K chip; AffyGW6: AUCs were achieved by using 1,933/1,204 CD/UC Immunochip SNPs that are also typed on Affymetrix Genome-Wide SNP Array 6.0 chip; full: AUCs were achieved by using all Immunochip SNPs.

References

    1. Franke A., McGovern D.P., Barrett J.C., Wang K., Radford-Smith G.L., Ahmad T., Lees C.W., Balschun T., Lee J., Roberts R. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 2010;42:1118–1125. - PMC - PubMed
    1. Anderson C.A., Boucher G., Lees C.W., Franke A., D’Amato M., Taylor K.D., Lee J.C., Goyette P., Imielinski M., Latiano A. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 2011;43:246–252. - PMC - PubMed
    1. Jostins L., Ripke S., Weersma R.K., Duerr R.H., McGovern D.P., Hui K.Y., Lee J.C., Schumm L.P., Sharma Y., Anderson C.A., International IBD Genetics Consortium (IIBDGC) Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–124. - PMC - PubMed
    1. Evans D.M., Visscher P.M., Wray N.R. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet. 2009;18:3525–3531. - PubMed
    1. Jakobsdottir J., Gorin M.B., Conley Y.P., Ferrell R.E., Weeks D.E. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet. 2009;5:e1000337. - PMC - PubMed

Publication types