Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 30;6(9):e1001146.
doi: 10.1371/journal.pgen.1001146.

Genetic variants and their interactions in the prediction of increased pre-clinical carotid atherosclerosis: the cardiovascular risk in young Finns study

Affiliations

Genetic variants and their interactions in the prediction of increased pre-clinical carotid atherosclerosis: the cardiovascular risk in young Finns study

Sebastian Okser et al. PLoS Genet. .

Abstract

The relative contribution of genetic risk factors to the progression of subclinical atherosclerosis is poorly understood. It is likely that multiple variants are implicated in the development of atherosclerosis, but the subtle genotypic and phenotypic differences are beyond the reach of the conventional case-control designs and the statistical significance testing procedures being used in most association studies. Our objective here was to investigate whether an alternative approach--in which common disorders are treated as quantitative phenotypes that are continuously distributed over a population--can reveal predictive insights into the early atherosclerosis, as assessed using ultrasound imaging-based quantitative measurement of carotid artery intima-media thickness (IMT). Using our population-based follow-up study of atherosclerosis precursors as a basis for sampling subjects with gradually increasing IMT levels, we searched for such subsets of genetic variants and their interactions that are the most predictive of the various risk classes, rather than using exclusively those variants meeting a stringent level of statistical significance. The area under the receiver operating characteristic curve (AUC) was used to evaluate the predictive value of the variants, and cross-validation was used to assess how well the predictive models will generalize to other subsets of subjects. By means of our predictive modeling framework with machine learning-based SNP selection, we could improve the prediction of the extreme classes of atherosclerosis risk and progression over a 6-year period (average AUC 0.844 and 0.761), compared to that of using conventional cardiovascular risk factors alone (average AUC 0.741 and 0.629), or when combined with the statistically significant variants (average AUC 0.762 and 0.651). The predictive accuracy remained relatively high in an independent validation set of subjects (average decrease of 0.043). These results demonstrate that the modeling framework can utilize the "gray zone" of genetic variation in the classification of subjects with different degrees of risk of developing atherosclerosis.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distributions of intima-media thickness (IMT) of the study subjects.
(A) IMT levels in the baseline and follow-up studies in 2001 and 2007, respectively. (B) IMT changes from 2001 to 2007. The age-stratified distributions depict the baseline age groups of 24–30 and 33–39 years (Younger and Older subjects), as well as their combined distribution (All subjects). The vertical lines indicate the representative 15% and 85% quantile points (q) that divide the subjects into two risk groups: the low-risk class (subjects with the lowest q% of IMT levels or changes) and the high-risk class (subjects with the highest q% of IMT levels or changes).
Figure 2
Figure 2. Prediction accuracy as a function of increasing risk classes.
The accuracy was defined using the area under the receiver operating characteristic curve (AUC), and the risk classes using the quantile points (5–25%). (A) Prediction of the baseline IMT risk classes in 2001 when using the conventional risk factors either alone, or when combined with the panel of 17 SNPs associated in previous studies with cardiovascular morbidity (Established SNPs), with those SNPs that are significantly associated with the low- and high-risk classes (Significant SNPs), or with the most predictive SNPs identified using the machine learning-based approach (Predictive SNPs). (B) Prediction of the follow-up IMT risk classes in 2007 using the baseline conventional and genetic risk factors measured in 2001. (C) Prediction of the IMT progression risk classes when using the baseline conventional and genetic risk factors measured in 2001 (the same as in (A,B)).
Figure 3
Figure 3. Candidate interaction partners of a variant in USF1 (rs2516839).
The candidate SNP-SNP interactions were searched among the variants predictive of the extreme IMT progression (see Table S4). The interaction score for a SNP-pair (x,y) is formula image, depicting the combined contribution of the SNP-pair to the predictive power (formula image), relative to that of the individual SNPs' contributions (formula image and formula image). The predictive power was assessed in terms of how much the AUC value changed when the particular SNP or SNP-pair was deleted from the subset of variants. The Gene ID was used as a SNP identifier, where available; otherwise, the rs ID was used instead.
Figure 4
Figure 4. Prediction accuracies on independent and randomized subject sets.
The accuracy was defined using the area under the receiver operating characteristic curve (AUC), and the risk classes using the quantile points (5%–25%). The prediction accuracies were evaluated for the baseline IMT risk classes in the independent dataset, in comparison with the cross-validated accuracies obtained in the original dataset using the same IMT thresholds, conventional risk factors and the most predictive SNPs identified with the machine learning-based procedure in the original subject set. The dotted trace shows the effect of deleting those subjects whose IMT level was the same or close to the quantile cut-off value (<0.02 difference in IMT). The randomized datasets were generated by first dividing the original set of subjects into the low- and high-risk classes at random, independent of their IMT-levels, and then repeating the same randomization process 100 times for each of the risk classes. The average AUC level for the various risk classes is reported. None of the 500 randomized datasets produced prediction accuracy higher than that obtained using the most predictive SNPs identified in the original set of subjects.

References

    1. Plomin R, Haworth CM, Davis OS. Common disorders are quantitative traits. Opinion. Nat Rev Genet. 2009;10:872–878. - PubMed
    1. Schork NJ, Nath SK, Fallin D, Chakravarti A. Linkage disequilibrium analysis of biallelic DNA markers, human quantitative trait loci, and threshold-defined case and control subjects. Am J Hum Genet. 2000;67:1208–1218. - PMC - PubMed
    1. Lanktree MB, Hegele RA, Schork NJ, Spence JD. Extremes of unexplained variation as a phenotype: an efficient approach for genome-wide association studies of cardiovascular disease. Circ Cardiovasc Genet. 2010;3:215–221. - PMC - PubMed
    1. Zhang G, Nebert DW, Chakraborty R, Jin L. Statistical power of association using the extreme discordant phenotype design. Pharmacogenet Genomics. 2006;16:401–143. - PubMed
    1. Eguchi T, Maruyama T, Ohno Y, Morii T, Hirao K, et al. Possible association of tumor necrosis factor receptor 2 gene polymorphism with severe hypertension using the extreme discordant phenotype design. Hypertens Res. 2009;32:775–779. - PubMed

Publication types

MeSH terms