Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 29;18(1):94.
doi: 10.1186/s12881-017-0451-2.

Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method

Affiliations

Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method

Guo-Bo Chen et al. BMC Med Genet. .

Abstract

Background: Predicting risk of disease from genotypes is being increasingly proposed for a variety of diagnostic and prognostic purposes. Genome-wide association studies (GWAS) have identified a large number of genome-wide significant susceptibility loci for Crohn's disease (CD) and ulcerative colitis (UC), two subtypes of inflammatory bowel disease (IBD). Recent studies have demonstrated that including only loci that are significantly associated with disease in the prediction model has low predictive power and that power can substantially be improved using a polygenic approach.

Methods: We performed a comprehensive analysis of risk prediction models using large case-control cohorts genotyped for 909,763 GWAS SNPs or 123,437 SNPs on the custom designed Immunochip using four prediction methods (polygenic score, best linear genomic prediction, elastic-net regularization and a Bayesian mixture model). We used the area under the curve (AUC) to assess prediction performance for discovery populations with different sample sizes and number of SNPs within cross-validation.

Results: On average, the Bayesian mixture approach had the best prediction performance. Using cross-validation we found little differences in prediction performance between GWAS and Immunochip, despite the GWAS array providing a 10 times larger effective genome-wide coverage. The prediction performance using Immunochip is largely due to the power of the initial GWAS for its marker selection and its low cost that enabled larger sample sizes. The predictive ability of the genomic risk score based on Immunochip was replicated in external data, with AUC of 0.75 for CD and 0.70 for UC. CD patients with higher risk scores demonstrated clinical characteristics typically associated with a more severe disease course including ileal location and earlier age at diagnosis.

Conclusions: Our analyses demonstrate that the power of genomic risk prediction for IBD is mainly due to strongly associated SNPs with considerable effect sizes. Additional SNPs that are only tagged by high-density GWAS arrays and low or rare-variants over-represented in the high-density region on the Immunochip contribute little to prediction accuracy. Although a quantitative assessment of IBD risk for an individual is not currently possible, we show sufficient power of genomic risk scores to stratify IBD risk among individuals at diagnosis.

Keywords: Case-control study; Complex trait; Crohn’s disease; Inflammatory bowel disease; Risk score; SNP array; Ulcerative colitis.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The study was approved by the Human Research Ethics Committees of all participating hospitals (Additional file 11). The lead hospital for this study was the Royal Brisbane and Women’s Hospital (Ref: 2003/155). Informed consent was obtained from all subjects in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Datasets used in this study. a SNP density of iChip and gChip SNPs. The whole genome was partitioned into 0.6 M bins on each chromosome. The middle and inner circles indicate the density of the SNPs on iChip and gChip, respectively. The spikes for iChip depict regions of dense coverage mainly chosen for replication and fine mapping of GWAS loci, while gChip provides a uniform coverage with higher average density. b Partitioning of data into sets of increasing sample size and number of SNPs. Samples were split into four subsets with increasing number of individuals and SNPs. The smallest subsets (dotted box) include samples genotyped on both gChip and iChip and SNPs overlapping between chips
Fig. 2
Fig. 2
Comparison of prediction performance of four methods using individuals and SNPs common between gChip and iChip. The sample consisted of 2479 cases and 3440 controls for CD and 2357 cases and 6740 controls for UC. The number of SNPs was 42,534. Prediction accuracy is measured as the area under the curve (AUC) with higher values denoting better performance. Vertical lines display the variation of estimates in 5-fold cross-validation. Prediction models were trained using either disease status (0–1) or disease phenotype adjusted for ancestry (adjusted)
Fig. 3
Fig. 3
Prediction performance with increasing sample size and SNP density using BayesR. Prediction accuracy is measured as the area under the curve (AUC) with higher values denoting better performance. Prediction models were trained using either disease status (0–1) or disease phenotype adjusted for ancestry (adjusted)
Fig. 4
Fig. 4
Distribution of genomic risk scores in UC and CD cases and controls of ANZ cohort. Kernel density estimates of risks scores in case and control groups predicted using models trained on IIBDGC samples and iChip
Fig. 5
Fig. 5
Odds ratio of case-control status. Individuals in the independent ANZ cohort were partitioned into 10 groups on the basis of the rank of their predicted risk score from BayesR, EN, GBLUP, and GPRS. The first decile is used as the reference group. The vertical bars denote mean and 95% confidence intervals from 5-fold cross-validation. The discovery populations included 123,437 iChip SNPs and 43,900 and 40,050 individuals for CD and UC, respectively

Similar articles

Cited by

References

    1. Molodecky NA, Soon IS, Rabi DM, Ghali WA, Ferris M, Chernoff G, Benchimol EI, Panaccione R, Ghosh S, Barkema HW, et al. Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic review. Gastroenterology. 2012;142(1):46–54. doi: 10.1053/j.gastro.2011.10.001. - DOI - PubMed
    1. Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491(7422):119–124. doi: 10.1038/nature11582. - DOI - PMC - PubMed
    1. Liu JZ, van Sommeren S, Huang H, Ng SC, Alberts R, Takahashi A, Ripke S, Lee JC, Jostins L, Shah T, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet. 2015;47(9):979–986. doi: 10.1038/ng.3359. - DOI - PMC - PubMed
    1. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748–752. - PMC - PubMed
    1. Evans DM, Visscher PM, Wray NR. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet. 2009;18(18):3525–3531. doi: 10.1093/hmg/ddp295. - DOI - PubMed

Publication types