Risk estimation and risk prediction using machine-learning methods

Jochen Kruppa¹, Andreas Ziegler, Inke R König

Affiliations

Affiliation

¹ Institut für Medizininsche Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Str. 1, 23562 Lübeck, Germany.

PMID: 22752090
PMCID: PMC3432206
DOI: 10.1007/s00439-012-1194-y

Review

Risk estimation and risk prediction using machine-learning methods

Jochen Kruppa et al. Hum Genet. 2012 Oct.

. 2012 Oct;131(10):1639-54.

doi: 10.1007/s00439-012-1194-y. Epub 2012 Jul 3.

Authors

Jochen Kruppa¹, Andreas Ziegler, Inke R König

Affiliation

¹ Institut für Medizininsche Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Str. 1, 23562 Lübeck, Germany.

PMID: 22752090
PMCID: PMC3432206
DOI: 10.1007/s00439-012-1194-y

Abstract

After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis.

PubMed Disclaimer

Figures

**Fig. 1**
Path to construct, evaluate and validate a rule of classification or probability estimation

**Fig. 2**
Flowchart of the systematic literature search

**Fig. 3**
a ROC curves for all methods in selected SNP sets in the test data. b ROC curves for Random Jungle in regression mode in all SNP sets in the test data

**Fig. 4**
Brier scores for scores based on lasso or Random Jungle regression in the test data

See this image and copyright information in PMC

References

1. Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, Lee AT, Plenge RM, Kastner DL, Gregersen PK. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc. 2009;3:S2. doi: 10.1186/1753-6561-3-s7-s2. - DOI - PMC - PubMed
1. Anderson J. Separate sample logistic discrimination. Biometrika. 1972;59:19–35. doi: 10.1093/biomet/59.1.19. - DOI
1. Arminger G, Enache D. Statistical models and artificial neural networks. In: Bock H, Polasek W, editors. Data analysis and information systems. Heidelberg: Springer; 1996. pp. 243–260.
1. Arshadi N, Chang B, Kustra R. Predictive modeling in case–control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset. BMC Proc. 2009;3(Suppl 7):S60. doi: 10.1186/1753-6561-3-s7-s60. - DOI - PMC - PubMed
1. Banerjee M, Ding Y, Noone A (2012) Identifying representative trees from ensembles. Stat Med 31:1601–1616. doi:10.1002/sim.4492 4 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 AR044422/AR/NIAMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Risk estimation and risk prediction using machine-learning methods

Affiliation

Risk estimation and risk prediction using machine-learning methods

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources