Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;14(4):138-148.
doi: 10.5808/GI.2016.14.4.138. Epub 2016 Dec 30.

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

Affiliations

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

Sungkyoung Choi et al. Genomics Inform. 2016 Dec.

Abstract

The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

Keywords: clinical prediction rule; genome-wide association study; penalized regression models; type 2 diabetes mellitus.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Outline of the risk prediction model construction and validation. T2D, type 2 diabetes; CV, cross-validation; ALL, single-nucleotide polymorphisms (SNPs) only reported in the genome-wide association study (GWAS) catalog; KARE, Korean Association Resource; ASIAN, SNPs only reported in the GWAS catalog with an Asian population; ALL + KARE, combined SNPs in the GWAS catalog and KARE cohort; ASIAN + KARE, combined SNPs in the GWAS catalog with an Asian population and the KARE cohort; SLR, stepwise logistic regression; LASSO, least absolute shrinkage and selection operator; EN, Elastic-Net.
Fig. 2
Fig. 2. Venn diagrams summarizing the number of variables shared among 5-fold CV by variables selection methods. CV, cross-validation; ALL, single-nucleotide polymorphisms (SNPs) only reported in the genome-wide association study (GWAS) catalog; ASIAN, SNPs only reported in the GWAS catalog with an Asian population; KARE, Korean Association Resource; ALL + KARE, combined SNPs in the GWAS catalog and KARE cohort; ASIAN + KARE, combined SNPs in the GWAS catalog with an Asian population and the KARE cohort; SLR, stepwise logistic regression; LASSO, least absolute shrinkage and selection operator; EN, Elastic-Net.
Fig. 3
Fig. 3. Internal validation shows the AUC values for each combination of variable selection and prediction methods. Each bar represents one of five SNP data sets. AUC, area under the receiver operating characteristic curve; SNP, single-nucleotide polymorphism; ALL, SNPs only reported in the genome-wide association study (GWAS) catalog; ASIAN, SNPs only reported in the GWAS catalog with an Asian population; KARE, Korean Association Resource; ALL + KARE, combined SNPs in the GWAS catalog and KARE cohort; ASIAN + KARE, combined SNPs in the GWAS catalog with an Asian population and the KARE cohort; SLR, stepwise logistic regression; LASSO, least absolute shrinkage and selection operator; EN, Elastic-Net.
Fig. 4
Fig. 4. External validation shows the AUC values for each combination of variable selection and prediction methods. Each bar represents one of five SNP data sets. AUC, area under the receiver operating characteristic curve; SNP, single-nucleotide polymorphism; ALL, SNPs only reported in the genome-wide association study (GWAS) catalog; ASIAN, SNPs only reported in the GWAS catalog with an Asian population; KARE, Korean Association Resource; ALL + KARE, combined SNPs in the GWAS catalog and KARE cohort; ASIAN + KARE, combined SNPs in the GWAS catalog with an Asian population and the KARE cohort; SLR, stepwise logistic regression; LASSO, least absolute shrinkage and selection operator; EN, Elastic-Net.

References

    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet. 2005;6:109–118. - PubMed
    1. Evans DM, Visscher PM, Wray NR. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet. 2009;18:3525–3531. - PubMed
    1. International Schizophrenia Consortium. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. - PMC - PubMed
    1. Davies RW, Dandona S, Stewart AF, Chen L, Ellis SG, Tang WH, et al. Improved prediction of cardiovascular disease based on a panel of single nucleotide polymorphisms identified through genome-wide association studies. Circ Cardiovasc Genet. 2010;3:468–474. - PMC - PubMed