Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers

Johanna Jakobsdottir¹, Michael B Gorin, Yvette P Conley, Robert E Ferrell, Daniel E Weeks

Affiliations

PMID: 19197355
PMCID: PMC2629574
DOI: 10.1371/journal.pgen.1000337

Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers

Johanna Jakobsdottir et al. PLoS Genet. 2009 Feb.

. 2009 Feb;5(2):e1000337.

doi: 10.1371/journal.pgen.1000337. Epub 2009 Feb 6.

Authors

Johanna Jakobsdottir¹, Michael B Gorin, Yvette P Conley, Robert E Ferrell, Daniel E Weeks

Affiliation

¹ Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, USA. joj8@pitt.edu

PMID: 19197355
PMCID: PMC2629574
DOI: 10.1371/journal.pgen.1000337

Abstract

Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10(-13), 10(-13), and 10(-3), respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.

PubMed Disclaimer

Conflict of interest statement

The authors are listed as the inventors in a patent filed by the University of Pittsburgh for the LOC387715/ARMS2 locus.

Figures

**Figure 1. Accuracy curves for binary markers.**
The curves of accuracy points (FPF, TPF pair) for binary markers with ORs 1.5, 10, and 50 are plotted. The black diamonds and horizontal dotted line highlight the points (FPF, TPF) = (FPF, 80%) on the accuracy curves. The ORs are marked on the curves.

**Figure 2. AUC for additive risk models of SNP markers as function of risk allele frequency in cases.**
The AUC is estimated for all risk allele frequencies in controls assuming additive ORs 1.5, 3, 5, 10, and 50 (the ORs are marked on the curves). The numbers in gray are the risk allele frequencies in controls corresponding to the maximum AUC for each OR. The dotted horizontal line in gray marks an AUC of 0.7 and 0.8. The black diamonds highlight the points (p _ca, AUC) = (p _ca, 0.80) for markers with additive ORs 10 and 50 (see Table 1).

**Figure 3. ROC curves for AMD classification models.**
The black diamond highlights the point (FPF, TPF) = (31%, 74%) on the ROC curve of the three-factor model of *CFH*, *LOC387715*, and C2. The gray line for reference gives the “chance” classification rule: the farther the ROC curve is from the chance line, the better the classification rule.

**Figure 4. Integrated predictiveness and classification plot for the three-factor model.**
The light-gray lines show how the plots are used in the examples given in the text: the dashed lines are for the first example with TPF = 74%, FPF = 31%, risk percentile = 35%, and AMD risk threshold = 4%; and the dotted lines are for the second example with AMD risk threshold = 25%, risk percentile = 85%, TPF = 17%, and FPF = 5%. On the top panel, the risks for cases are marked with a dot in black while the risks for controls are marked with a vertical line segment in dark-gray.

See this image and copyright information in PMC

References

1. Mitka M. Genetics research already touching your practice. American Medical News. 1998 April 6 News sect: 3.
1. Feero WG. Genetics of common disease: a primary care priority aligned with a teachable moment? Genet Med. 2008;10:81–82. - PubMed
1. Goetz T. 23AndMe will decode your DNA for $1000. Welcome to the age of genomics. Wired Magazine. 2007;15.12:256–265, 283.
1. Calefato JM, Nippert I, Harris HJ, Kristoffersson U, Schmidtke J, et al. Assessing educational priorities in genetics for general practitioners and specialists in five countries: factor structure of the Genetic-Educational Priorities (Gen-EP) scale. Genet Med. 2008;10:99–106. - PubMed
1. Julian-Reynier C, Nippert I, Calefato JM, Harris H, Kristoffersson U, et al. Genetics in clinical practice: general practitioners' educational priorities in European countries. Genet Med. 2008;10:107–113. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers

Affiliation

Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous